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Abstract 

Compressive sensing (CS) has recently emerged as a framework for efficiently capturing 
signals that are sparse or compressible in an appropriate basis. While often motivated as an al- 
ternative to Nyquist-rate sampling, there remains a gap between the discrete, finite-dimensional 
CS framework and the problem of acquiring a continuous-time signal. In this paper, we attempt 
to bridge this gap by exploiting the Discrete Prolate Spheroidal Sequences (DPSS's), a collec- 
tion of functions that trace back to the seminal work by Slepian, Landau, and Pollack on the 
effects of time-limiting and bandlimiting operations. DPSS's form a highly efficient basis for 
sampled bandlimited functions; by modulating and merging DPSS bases, we obtain a dictio- 
nary that offers high-quality sparse approximations for most sampled multiband signals. This 
multiband modulated DPSS dictionary can be readily incorporated into the CS framework. We 
provide theoretical guarantees and practical insight into the use of this dictionary for recovery 
of sampled multiband signals from compressive measurements. 

Keywords: Compressive sensing, multiband signals. Discrete Prolate Spheroidal Sequences, Fourier 
analysis, sampling, block-sparsity, approximation, signal recovery, greedy algorithms 

1 Introduction 

1.1 Compressive sensing of analog signals 

In recent decades the digital signal processing community has enjoyed enormous success in devel- 
oping hardware and algorithms for capturing and extracting information from signals. Capitalizing 
on the early work of Whitaker, Nyquist, Kotelnikov, and Shannon on the sampling and representa- 
tion of continuous signals, signal processing has moved from the analog to the digital domain and 
ridden the wave of Moore's law. Digitization has enabled the creation of sensing and processing 
systems that are more robust, flexible, cheaper and, therefore, more ubiquitous than their analog 
counterparts. 

The foundation of this progress has been the Nyquist sampling theorem, which states that in 
order to perfectly capture the information in an arbitrary continuous-time signal x{t) with bandlimit 

* Corresponding author. Phone: (303) 273-3607. Fax: (303) 273-3602. 

This work was partially supported by NSF grants DMS-1004718 and CCF-0830320, DARPA grant FA8650-08- 
C-7853, and AFOSR grant FA9550-09- 1-0465. 



-^Y^ Hz, we must sample the signal at its Nyquist rate of -Bnyq samples/see. This requirement has 
placed a growing burden on analog-to-digital converters as applications that require processing 
signals of ever-higher bandwidth lead to ever-higher sampling rates. This pushes these devices 
toward a physical barrier, beyond which their design becomes increasingly difficult and costly [77] . 

In recent years, compressive sensing (CS) has emerged as a framework that can significantly 
reduce the acquisition cost at a sensor [2, 10, 28]. CS builds on the work of Candcs, Romberg, and 
Tao [13] and Donoho [28], who showed that a signal that can be compressed using classical methods 
such as transform coding can also be efficiently acquired via a small set of nonadaptive, linear, and 
usually randomized measurements. 

There remains, however, a prominent gap between the theoretical framework of CS, which deals 
with acquiring finite-length, discrete signals that are sparse or compressible in a basis or dictionary, 
and the problem of acquiring a continuous-time signal. Previous work has attempted to bridge 
this gap by employing two very different strategies. First, works such as [73] have employed a 
simple multitone analog signal model that maps naturally into a finite-dimensional sparse model. 
Although this assumption allows the reconstruction problem to be formulated directly within the 
CS framework, the multitone model can be unrealistic for many analog signals of practical interest. 
Alternatively, other authors have considered a more plausible multiband analog signal model that is 
also amenable to sub- Nyquist sampling [8,34,35,47,48,75]. These works, however, have involved 
customized sampling protocols and reconstruction formulations that fall largely outside of the 
standard CS framework. Indeed, some of this body of literature and many of its underlying ideas 
actually pre-date the very existence of CS. 

1.2 Contributions and paper organization 

In this paper, we bridge this gap in a different manner. Namely, we show that when dealing with 
finite-length windows of samples, it is possible to map the multiband analog signal model — in a very 
natural way — into a finite-dimensional sparse model. One can then apply many of the standard 
theoretical tools of CS to develop algorithms for both recovery as well as compressive domain 
processing of multiband signals. 

Our work actually rests on ideas that trace back to the classical signal processing literature 
and the study of time-frequency localization. The Weyl-Heisenberg uncertainty principle states 
that a signal cannot be simultaneously localized on a finite interval in both time and frequency. 
A natural question is to what extent it is possible to concentrate a signal x{t) and its continuous- 
time Fourier transform (CTFT) X(F) near finite intervals. In an extraordinary series of papers 
from the 1960s and 1970s, Slepian, Landau, and Pollack provide an in-depth investigation into 
this question [42,43,63,65,67]. The implications of this body of work have had a tremendous 
impact across a number of disciplines within mathematics and engineering, particularly in the field 
of spectral estimation and harmonic analysis (e.g., [69]). Very few of these ideas have appeared 
in the CS literature, however, and so one goal of this paper is to carefully explain — from a CS 
perspective — the natural role that these ideas can indeed play in CS. 

We begin this paper in Sections 2 and 3 with a description of our problem setup and a survey 
of the necessary CS background material. In Section 4, we introduce the multitone and multiband 
analog signal models. We then discuss how sparse representations for multiband signals can be 
incorporated into the CS framework through the use of Discrete Prolate Spheroidal Sequences 
(DPSS's) [65]. First described by Slepian in 1978, DPSS's form a highly efficient basis for sampled 
bandlimited functions. For the sake of clarity and completeness, we provide a self-contained review 
of the key results from Slepian's work that are most relevant to the problem of modeling sampled 
multiband signals. We then explain how, by modulating and merging DPSS bases, one obtains a 
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dictionary that — to a very high degree of approximation — provides a sparse representation for most 
finite-length, Nyquist-rate sample vectors arising from multiband analog signals. We also explain 
why the qualifiers "approximation" and "most" in the preceding sentence are necessary; however, 
we characterize them formally and justify the use of the multiband modulated DPSS dictionary in 
practical settings. 

In Section 5, we discuss the use of the multiband modulated DPSS dictionary for recovery of 
sampled multiband signals from compressive measurements. We discuss the implications (in terms 
of formulating reconstruction procedures and guaranteeing their performance) of the fact that our 
dictionary is not quite orthogonal; in fact, it may be undcrconiplete or overcomplete, depending 
on the setting of a user-defined parameter. We also provide theoretical guarantees for recovery 
algorithms that exploit the block-sparse nature of signal expansions in our dictionary. Ultimately, 
this allows us to guarantee that most finite-length sample vectors arising from multiband analog 
signals can — to a high degree of approximation — be recovered from a number of measurements that 
is proportional to the underlying information level (also known as the Landau rate [41]). 

In Section 6, we present the results of a detailed suite of simulations for signal recovery from 
compressive measurements, illustrating the effectiveness of our proposed approaches on realistic 
signals. We show that the reconstruction quality achieved using the multiband modulated DPSS 
dictionary is far better than what is achieved using the discrete Fourier transform (DFT) as a 
sparsifying basis. These results confirm that a DPSS-based dictionary can provide a very attractive 
alternative to the DFT for sparse recovery. We conclude in Section 7 with a final discussion and 
directions for future work. 

1.3 Relation to existing work 

Although customized measurement and reconstruction schemes [8, 34, 35, 47, 48, 75] have previously 
been proposed for efficiently sampling multiband signals, we believe that our paper is of independent 
interest from these works, specifically because we restrict ourselves to operating within the finite- 
dimensional CS framework. There are a variety of plausible CS (and even non-CS) scenarios where 
a sparse representation of a finite-length Nyquist-rate sample vector would be useful, and it is this 
problem to which we devote our attention. This work may be of interest, for example, to any 
practitioner who has struggled with the lack of sparsity that the DFT dictionary provides even 
for pure sampled tones at "off-grid" frequencies. Moreover, as we discuss more fully in Section 2, 
several analog CS hardware architectures can be viewed as providing random projections of finite- 
length, Nyquist-rate sample vectors. Our formulation is compatible with these architectures and 
does not require a customized sampling protocol. 

It is important to mention that we are not the first authors to recognize the potential role 
that DPSS's (or their continuous-time counterparts, the Prolate Spheroidal Wave Functions, or 
PSWF's [42, 43, 63, 67]) can play in CS. Izu and Lakey [40] have drawn an analogy between sampling 
bounds for multiband signals and classical results in CS, but not specifically for the purpose of 
using the finite-dimensional CS framework for sparse recovery of sample vectors from multiband 
analog signals. Gosse [37] has considered the recovery of smooth functions from random samples; 
however, this work focuses on a very different setting, employing a PSWF (not DPSS) dictionary, 
considering only baseband signals, and exploiting sparsity in a different way than our work. Senay 
et al. [60, 61] have also considered a PSWF dictionary for reconstruction of signals from nonuniform 
samples; however, this work also focuses on baseband signals and lacks formal approximation and 
CS recovery guarantees. Oh et al. [53] have employed a modulated DPSS dictionary for sampled 
bandpass signals; however, this work falls largely outside the standard CS framework and again lacks 
formal approximation and CS recovery guarantees of the type we provide. Finally, Sejdic et al. [59] 
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have proposed a multiband modulated DPSS dictionary very similar to our own and a greedy 
algorithm for signal decomposition in that dictionary. However, this work is again not devoted 
to developing sparse approximation guarantees for sampled multiband signals. It focuses not on 
signal recovery but on identification of a communications channel, and the proposed reconstruction 
algorithm is not intended to exploit block-sparse structure in the signal coefficients. We hope that 
our paper will be a valuable addition to this nascent literature and help to encourage much further 
exploration of the connections between DPSS's, PSWF's, and CS. 

1.4 Preliminaries 

Before proceeding, we first briefly introduce some mathematical notation that we will use through- 
out the paper. We use bold characters to indicate finite-dimensional vectors and matrices. All such 
vectors and matrices are indexed beginning at 0, so that the first clement of a length- A?^ vector x is 
given by x[0] and the last by x[N — 1]. We denote the Hermetian transpose of a matrix A by . 
We use ||-||p to denote the standard £p norm. We also use ||£c||q := |supp(a;)| to denote the number 
of nonzeros of x, and we say that x is S'-sparse if ||a;||Q = S. We use E [x] to denote expected value 
of a random variable x and P [E] to denote the probability of an event E. Finally, we adopt the 
convention j = \f—\ throughout the paper. 

2 Mapping Analog Sensing to the Digital Domain 

In the standard CS setting, one is concerned with recovering a finite-dimensional vector x G 
from a limited number of measurements. A typical first order assumption is that the vector x is 
sparse, meaning that there exists some basis or dictionary ^ such that x = *a and ex. has a small 
number of nonzeros, i.e., ||q:||q < S for some S <^ N. One then acquires the measurements 

y = Ax + e (1) 

where A G <C^^^^ maps x to a length-M vector of complex- valued measurements, and where e is 
a length-M vector that represents measurement noise generated by the acquisition hardware. In 
the context of CS, one seeks to design A so that M is on the order of S (the number of degrees of 
freedom of the signal) and potentially much smaller than N. 

In the present work, however, we are concerned with the acquisition of a finite-length window 
of a complex- valued, continuous-time signal, which we denote by x{t). Specifically, we suppose that 
we are interested in a time window of length T„ seconds and that we acquire the measurements 

y = Hxit)) + e, (2) 

where $ is a linear measurement operator that maps functions defined on [0, T^) to a length-M 
vector of measurements and e again represents measurement noise. We assume throughout this 
paper that x{t) is bandlimited with bandlimit Hz, i.e., that x{t) has a continuous-time Fourier 
transform (CTFT) 

/oo 
x{t)e-^^''^*dt 
-OO 

such that X{F) = for |F| > Additional assumptions on x{t) will be specified in Section 4.1.2. 

Because we assume that x{t) is bandlimited and that the measurement process (2) takes place 
over a finite window of time, we restrict our attention to the problem of recovering the Nyquist-rate 
samples of x{t) over this time interval. Specifically, we let Tg < denote a sampling interval (in 
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seconds) chosen to meet the minimum Nyquist samphng rate, and we let x[n] denote the infinite- 
length sequence that would be obtained by uniformly sampling x{t) with sampling period Tg, i.e., 
x[n] = x(nTs). We are interested in a time window of length seconds, during which there are 
N = [^] samples. We let x = [x[0] x[l] ■ ■ ■ x[N — l]]^ denote x[n\ truncated to the N samples 
from to — 1. This paper is specifically devoted to the problem of recovering x, the vector of 
Nyquist-rate samples of x{t) on [0,r„),-'^ from compressive measurements y of the form (2). 

To facilitate this, we first note that the sensing model in (2) is clearly very similar to the 
standard CS model in (1). We briefly describe conditions under which these models are equivalent. 
Recall from the Shannon- Nyquist sampling theorem that x(t) can be perfectly reconstructed from 
x[n] since ^ -^nyq- Specifically, we have the formula 

oo 

x{t) = x[n] sine {t/Tg — n) , (3) 

n=— oo 

where 



sine (t) 



sin(7rt)/(7rt), t Q 
1, t = 0. 



Observe that since $ is linear, we can express each measurement y[m] in (2) simply as the inner 
product between x{t) and some sensing functional 4>m{t)i i-e.,^ 

y[m] = {^m{t),x{t))+e[m]. (4) 

In this case we can use (3) to reduce (4) to 

I oo \ oo 

y[m] = ( 4)m{t), ^ x[n] sine {t/% -n)\ +e[m] = ^ x[n\ sine {t/Ts - n))+e[m]. (5) 

\ n=— oo / n=— oo 

If we let A denote the M x N matrix with entries given by 

A[m, n] = {(l)m{t), sine (t/Ts - n)) 
and let w denote the length-M vector with entries given by 

w[m] = ^ x[n] (^m(i), sine (i/Tg - n)) , (6) 

n<-l 

n>N 

then (2) reduces to 

y = Ax + w + e. (7) 

If the vector w = 0, then (7) is exactly equivalent to the standard CS sensing model in (1). 
Moreover, if w is not zero but is small compared to e, then we can simply absorb w into e and 
again reduce (7) to (1). 



'^Note that our goal is to recover x, which of course carries useful information about x(t), but recovering x may 
not be sufficient for exactly recovering x{t) on the entire window [0,Tw). (This depends on the exact sampling rate 
and the decay of the analog interpolation kernel.) In practice, the methods we describe in this paper for digital 
single-window reconstruction could be implemented in a streaming multi-window setting, and this would allow for a 
more accurate reconstruction of x{i) on the entire window. 

^In our setup, since $ maps functions defined on [0, Tw) to vectors in C*^, we are inherently assuming that 
<pm{t) = outside of [0, Tw), so that the sensing functionals are time-limited. Although certain acquisition systems 
(such as the modulated wideband converter of [48]) do not satisfy this condition, we believe that it is often a reasonable 
assumption in practice and that many acquisition systems can at least be well-approximated as time-limited. 



5 



A precise statement concerning the size of w would depend greatly on the choice of the (f>mit)- 
While a detailed analysis of w for various practical choices of (j)m{t) is beyond the scope of this 
paper, we briefly mention some possible strategies for controlling w. First, one can easily show 
that if each ^m(i) consists of any weighted combination of Dirac delta functions positioned at times 
0, Ts, . . . , (A'^ — l)Ts, then by construction w = 0. This should not be surprising, as in this case it 
is clear that the measurements are simply a linear combination of the Nyquist-rate samples from 
the finite window. Importantly, it is possible to collect measurements of this type without first 
acquiring the Nyquist-rate samples (see, for example, the architecture proposed in [62]), although 
there are also plenty of situations in which one might explicitly apply a matrix multiplication to 
compress data after acquiring a length- vector of Nyquist-rate samples. 

For many architectures used in practice, it will not be the case that w = Q exactly. However, it 
may still be possible to ensure that w remains very small. There are a number of possible routes to 
such a guarantee. For example, the (pmit) could be designed to incorporate a smooth window g{t) so 
that we effectively sample x{t)g{t) instead of x{t), where g{t) is designed to ensure that .T[n](/[n] ~ 
for n < —loin>N. The reconstruction algorithm could then compensate for the effect of g[n\ 
on 0, 1, . . . , A" — 1. Alternatively, by considering a slightly oversampled version of x{t) (so that ijr 
exceeds -Bnyq by some nontrivial amount) it is also possible to replace the sine interpolation kernel 
with one that decays significantly faster, ensuring that the inner products in (6) decay to zero 
extremely quickly [18]. Finally, as we will see below, many constructions of 4>m{i) often involve a 
degree of randomness that could also be leveraged to show that with high probability, the inner 
products in (6) decay even faster. However, since the details will depend greatly on the particular 
architecture used, we leave such an investigation for future work. 

Having argued that the measurement model in (2) can often be expressed in the form (1), we 
now turn to the central theoretical question of this paper: 

Supposing that x{t) obeys the multiband model described in Section 4.1, how can we 
recover x, i.e., the Nyquist-rate samples of x{t) on [0, T^), from compressive measure- 
ments of the form y = Ax + e? 

In order to answer this question, of course, we will need a dictionary ^ that provides a suitably 
sparse representation for x. We devote Section 4 to constructing such a dictionary. In addition to 
a dictionary ^, however, we will also need a reconstruction algorithm that can efficiently recover 
X from the compressive measurements y. While it is certainly possible to apply out-of-the-box CS 
recovery algorithms to this problem, there are certain properties of our dictionary that make the 
recovery problem worthy of further consideration. (In particular, the columns of our dictionary 
^ will typically not be orthogonal, and the sparse coefficient vectors a that arise will tend to 
have structured (block-sparse) sparsity patterns.) In light of these nuances. Section 3 now provides 
additional background on CS that will allow us to formulate a principled recovery technique. 



3 Compressive Sensing Background 
3.1 Sensing matrix design 

Setting aside the question of how to design the sparsity-inducing dictionary we first address 
the problem of designing A. Although many favorable properties for sensing matrices have been 

studied in the context of CS, the most common is the restricted isometry property (RIP) [14]. We 
say that the matrix A* satisfies the RIP of order S if there exists a constant 6s G (0, 1) such that 

^ < ^^^^ < V^Ts^ (8) 
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holds for all a such that ||q:||q < S. In words, preserves the norm of S'-sparse vectors. Note 
that for any pair of vectors a and a' such that ||q!||o = ||ct'|lo = 'S, we have that \\a — cx'Wq < 25". 
This gives us an alternative interpretation of (8) — namely that the RIP of order 25" ensures that 
A* preserves Euclidean distances between iS-sparse vectors a. 

A related concept is what wc call the ^'-RIP (following the notation in [12]). Specifically, we 
say that the matrix A satisfies the *-RIP of order S if there exists a constant Ss G (0, 1) such that 

, II /librvll , 

holds for all a such that ||q:||q < S. When ^ is an orthonormal basis, (8) and (9) are equivalent. 
However, we will be concerned in this paper with non-orthogonal (and even non-square) dictionar- 
ies in which case the RIP and the ^'-RIP are slightly different concepts: the former ensures 
norm preservation of all sparse coefficient vectors a, while the latter ensures norm preservation 
of all signals having a sparse representation x = ^a. In many problems (such as when ^ is an 
overcomplete dictionary), the RIP is considered to be a stronger requirement. 

There are a variety of approaches to constructing matrices that satisfy the RIP or 'J'-RIP, some 
of which are better suited to practical architectures than others. From a theoretical standpoint, 
however, the most fruitful approaches involve the use of random matrices. Specifically, we consider 
matrices constructed as follows: given M and N, we generate a random M x N matrix A by 
choosing the entries A[m, n] as independent and identically distributed (i.i.d.) random variables. 
While it is not strictly necessary, for the sake of simplicity we will consider only real- valued random 
variables, so that A e M^^-^. 

We impose two conditions on the random distribution. First, wc require that the distribution 
is centered and normalized such that E(A[m, n]) = and E(A[m,n]^) = jj. Second, we require 
that the distribution is subgaussian [9, 76], meaning that there exists a constant cq > such that 

E (^e^I'"."]*) < e'^o*' (10) 

for all t G M. Examples of subgaussian distributions include the Gaussian distribution, the 
Rademacher distribution, and the uniform distribution. In general, any distribution with bounded 
support is subgaussian. 

The key property of subgaussian random variables that will be of use in this paper is that for 
any x G C^, the random variable ||Aa;||2 is highly concentrated about ||a3||2. In particular, there 
exists a constant ci{ri) > that depends only on 77 and the constant cq in (10) such that 



I A 11^ II l|2 



^ II 11^ 



where the probability is taken over all draws of the matrix A (see Lemma 6.1 of [27] or [19]).^ 
We leave the constant 01(77) imdcfincd since it will depend both on the particular subgaussian 
distribution under consideration and on the range of r] considered. Importantly, however, for any 
subgaussian distribution and any 77max) we can write ci(r/) = kt?^ for r] < rj^ax with k being a 
constant that depends on certain properties of the distribution [19]. This concentration bound has 



^Thc concentration result in (11) is typically stated for x G instead of C^. The complex case follows from the 
real case by handling the real and imaginary parts separately and then applying the union bound, which results in a 
factor of 4 instead of 2 in front of the exponent. 
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a number of important consequences. Perhaps most important for our purposes is the following 
lemma (an adaptation of Lemma 5.1 in [4]).^ 

Lemma 3.1. Let X denote any S -dimensional subspace of . Fix (5, /? G (0,1). Let A be an 
M X N random matrix with i.i.d. entries chosen from a distribution satisfying (11)- If 

251og(42/J) + log(4/^) 

then with probability exceeding 1 — (3, 

Vl-S \\x\\2 < \\Ax\\2 < Vl + 5 ||a;||2 (13) 

for all X e X. 

When ^ is an orthonormal basis, one can use this lemma to go beyond a single S'-dimensional 
subspace to instead consider all possible subspaces spanned by S columns of thereby establishing 
the RIP for A^. The proof follows that of Theorem 5.2 of [4]. 

Lemma 3.2. Let * be an orthonormal basis for and fix 6,f3 G (0, 1). Let A be an M x N 
random matrix with i.i.d. entries chosen from a distribution satisfying (11). If 

^ ^ 2Slog{A2eN/5S) + logiA/(3) 

ci{5/V2) ^ ' 

with e denoting the base of the natural logarithm, then with probability exceeding 1 — /3, A^ will 
satisfy the RIP of order S with constant 5. 

Proof. This is a simple generalization of Lemma 3.1, which follows from the observation that (8) is 

equivalent to (13) holding for all S'-dimensional subspaces. There are (^) < (eN/S)^ subspaces of 
dimension S aligned with the coordinate axes of and so applying a union bound to Lemma 3.1 
we obtain the desired result. □ 

Prom essentially the same argument, we can also prove for more general dictionaries ^' that A 
will satisfy the *-RIP. 

Corollary 3.1. Let * be an arbitrary N x D matrix and fix S,/3 G (0, 1). Let A be an M x N 
random matrix with i.i.d. entries chosen from a distribution satisfying (11). If 

^ ^ 2S\og{A2eD/5S) + \og{A/P) 
Cl{5/^^2) 

with e denoting the base of the natural logarithm, then with probability exceeding 1 — P, A will 
satisfy the ^-RIP of order S with constant 6. 

As noted above, the random matrix approach is somewhat impractical to build in hardware. 
However, several hardware architectures have been implemented and/or proposed that enable com- 
pressive samples to be acquired in practical settings. Examples include the random demodula- 
tor [73], random filtering [74], the modulated wideband converter [48], random convolution [1,57], 
and the compressive multiplexer [62]. In this paper we will rely on random matrices in the de- 
velopment of our theory, but we will see via simulations that the techniques we propose are also 
applicable to systems that use some of these more practical architectures. 

*The constants in [4] differ from those in Lemma 3.1, but the proof is substantially the same (see [21]). Note that 
in [21] A' is a subspace of rather than C^. In our case we incur an additional factor of 2 in the constant which 
arises as a consequence of the increase in the covering number for a sphere in C'^ (which can easily be derived from 
the fact that there is an isometry between C"^ and K^"^). 
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3.2 CS recovery algorithms 

3.2.1 Greedy and iterative algorithms 

Before we return to the problem of designing we first discuss the question of how to recover the 
vector X from measurements of the form y = Ax + e = A^ot + e. The original CS theory proposed 
^i-minimization as a recovery technique [10,28]. Convex optimization techniques arc powerful 
methods for CS signal recovery, but there also exist a variety of alternative greedy or iterative 
algorithms that are commonly used in practice and that satisfy similar performance guarantees, 
including iterative hard thresholding (IHT) [6], orthogonal matching pursuit (OMP) [24, 26, 54, 72], 
and several more recent variations on OMP [16, 17, 29, 50-52]. 

In this paper we will restrict our attention to two of the most commonly used algorithms in 
practice — IHT and CoSaMP [6, 50]. We begin with IHT, which is probably the simplest of all CS 
recovery algorithms. As is the case for most iterative recovery algorithms, a core component of 
IHT is hard thresholding. Specifically, we define the operator 

, r 1 |Q:[rz], I a [n] I is among the S largest elements of I a I; , , 

hard(a, S')[n] = <^ . (16) 

10, otherwise. 

In words, the hard thresholding operator sets all but the S largest elements of a vector to zero 
(with ties broken according to any arbitrary rule). 

To the best of our knowledge, there are no existing papers that specifically discuss how to 
implement IHT when ^ is not an orthonormal basis or a tight frame (sec [15] for a discussion of 
the latter case). Nonetheless, we can envision two natural (and reasonable) ways that the canonical 
IHT algorithm [6] can be extended to handle a general dictionary. In the first of these variations, 
the algorithm would consist of iteratively applying the update rule 

= hard(a^ + /i(A*)^(y - A*a^), S) (17) 

where ^ is a parameter set by the user. In the second of these variations, the algorithm would 
consist of iteratively applying the update rule 

* • hard (x^ + hA" - Ax^^ ) , . (18) 

When ^ is an orthonormal basis these algorithms are equivalent, but in general they are not. 
On the whole, IHT is a remarkably simple algorithm, but in practice its performance is greatly 
dependent on careful selection and adaptation of the parameter We refer the reader to [6] for 
further details. 

CoSaMP is a somewhat more complicated algorithm, but can be easily understood as breaking 
the recovery problem into two separate sub-problems: identifying the S columns of ^ that best 
represent x and then projecting onto that subspace. The former problem is clearly somewhat 
challenging, but once solved, the latter is relatively straightforward. In particular, if we have 
identified the optimal columns of ^, indexed by the set A, then we can recover x via least-squares. 
In this case, an optimal recovery strategy is to solve the problem: 

X = arg min ||y — Az\\2 s.t. z G TZ{^a), (19) 



z 



where *a denotes the submatrix of * that contains only the columns of * corresponding to the 
index set A and TI{^a) denotes the range of *a. If we let A = A^, then one way to obtain the 

solution to (19) is via the pseudoinverse of Aa, denoted a\. Specifically, we can compute 



OA 



A^y=(^Aj^AAj Af,y and a\A- = (20) 
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Algorithm 1 Compressive Sampling Matching Pursuit (CoSaMP) 
input: A, y, S, stopping criterion 
initialize: = y, = 0, £ = 
while not converged do 
proxy: = A^r^ 

identify: f^^+i = supp(hard(*^h^ 2S)) 
merge: A^+i = supp(hard(*^a;^ S)) U 0^+^ 
update: x = arg min^, \\y — Az\\2 s.t. z € Tl{'if^t+i) 
^e+i ^ ip . hard(*^i, S) 
r^+i = y _ Ax^+^ 
1 = 1 + 1 

end while 
output: X = x^ 



and then set x = While this is certainly not the only approach to solving (19) (as we will see in 
Section 6.1.2), it allows us to easily observe that in the noise- free setting, if the support estimate A 
is correct, then y = Ax = Aaq;|a, and so plugging this into (20) yields a = a (and hence x = x) 
provided that Aa has full column rank. Thus, the central challenge in recovery is to correctly 
identify the set A. CoSaMP and related algorithms solve this problem by iteratively identifying 
likely columns, performing a projection, and then improving the estimate of which columns to use. 

Unfortunately, we are again not aware of any papers that specifically discuss how to implement 
CoSaMP when ^ is not an orthonormal basis. Nonetheless, we can envision two natural extensions 
of the canonical CoSaMP algorithm [50]. One of these is shown in Algorithm 1;^ in a sense, this 
formulation is more analogous to (18) than to (17) because it is focused on recovery of x rather 
than a. However, Algorithm 1 is actually quite flexible and can be invoked in multiple ways. To 
help distinguish among the different possibilities, it will be helpful to introduce the notation 

X = CoSaMP(A,*,y,S) 

to denote the output produced by Algorithm 1 when the arguments {A, y, S) are provided as 
input. Having set this notation, it is also reasonable to consider invoking Algorithm 1 with the 
input arguments {A^,I,y,S). This formulation is more analogous to (17). In this case we will 
denote the output by S = CoSaMP(A'i', /, y, S) since the algorithm will construct and output an 
estimate of a (rather than x) . 

3.2.2 "Model-based" recovery algorithms 

Traditional approaches to CS signal recovery, like those described above, place no prior assumptions 
on supp(q!). Sparsity on its own implies nothing about the locations of the nonzeros, and hence most 
approaches to CS signal recovery treat every possible support as equally likely. However, in many 
practical applications the nonzeros are not distributed completely at random, but rather exhibit a 

^We note that the choice of 25 in the "identify" step is primarify driven by the proof technique, and is not intended 
to be interpreted as an optimal or necessary choice. For example, in [17] it is shown that the choice of S is sufficient 
to cstablisfi performance guarantees similar to those for CoSaMP. It is also important to note that when the number 
of measurements M is very small (less than 3S) it is necessary to make suitable modifications as the assumptions of 
the algorithm are clearly violated in this case. Moreover, a simple extension of CoSaMP as presented here involves 
including an additional orthogonalization step after pruning x down to an S-dimensional estimate, as is also done 
in [17]. This can often result in modest performance gains and is a technique that we exploit in our simulations. 
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degree of structure. In the case of signals exhibiting such structured sparsity, it is possible to both 
reduce the required number of measurements and develop specialized "model-based" algorithms for 
recovery that exploit this structure [3,5]. 

In this paper, we are interested in the model of block- sparsity [3,31]. In a block-sparse vector 
a, the nonzero coefficients cluster in a small number of blocks. Specifically, suppose that x = ^cx 
with * being an N x D matrix and that we decompose * into J submatrices of size N x ^, i.e., 

* = [*o *1 *J-l] 

Then we can write x = X^^q^ where each G C^/"^. We say that oc is i^-block-sparse if 

there exists a set X C {0, 1, . . . , J — 1} such that \I\ = K and = for all i ^X. With some 
abuse of notation, we now let let denote the submatrix of ^ that contains only the columns of 
^ corresponding to the blocks indexed by 1. 

We first illustrate how we can exploit block-sparsity algorithmically. Our goal is to generalize 
IHT and CoSaMP to the block-sparse setting. To do this, we observe that the hard thresholding 
function plays a key role in both algorithms. One way to interpret this role is that hard(^''^a;, S) 
is actually computing a projection of onto the set of ^-sparse vectors. In the case where ^ is 

an orthonormal basis we can also interpret ^ ■ hard(^^a;, S) as projecting x onto the set of signals 
that are S'-sparse with respect to the basis 

In the block-sparse case we must replace hard thresholding with an appropriate operator that 
takes a candidate signal and finds the closest K-block-sparse approximation. Towards this end, we 
define 

iS*(£C, K) := arg min min ||a; — . (21) 
\x\<K ze7e(*x) 

5^(x, K) is analogous to the support of a in the traditional sparse setting: it tells us which set of 
K blocks of ^ can best approximate x. Along with 5*(a;, K), we also define 

7'*(a;,K) := arg min||a;-z||2 s.t. z ^'R{^s^{x,k)) ■ (22) 

z 

V^! is simply the projection of the vector x G onto the set of K-block-sparse signals. To simplify 
our notation, we will often write S{<x,K) = Sif,{ot,K) and V{x,K) = V\s,{x,K) when * is clear 
from the context. 

For each of the IHT and CoSaMP algorithms proposed in Section 3.2.1, it is possible to propose 
a variation of the algorithm designed to exploit block-sparsity simply by replacing the hard thresh- 
olding operator with an appropriate block-sparse projection. For example, one block-based version 
of IHT (which is also a special case of the iterative projection algorithm in [5]), would consist of 
replacing the core iteration of IHT in (18) with 

x^+^ = r{x^ + fiA"{y - Ax^),K). (23) 

Note that the only difference from (18) is that we have replaced hard thresholding with the pro- 
jection onto the set of block-sparse signals. Similarly, a block-based version of CoSaMP is shown 
in Algorithm 2. 

Algorithm 2 is in differing senses both a special case and a generalization of the model-based 
CoSaMP algorithm proposed in [3] . Specifically, in [3] an algorithm for block-sparse signal recovery 
is proposed that is equivalent to Algorithm 2 when \& = 7. The more general case of arbitrary ^ is 

not discussed. However, there are alternative options for handling ^ ^ I besides the one specified 
in Algorithm 2. Like Algorithm 1, Algorithm 2 is quite fiexible and can be invoked in multiple 
ways. Following our convention in Section 3.2.1, we use the notation x = BBCoSaMP(A, y, K) 
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Algorithm 2 Block-Based CoSaMP (BBCoSaMP) 



input: A, y, K, stopping criterion 
initialize: = y, = 0, £ = 
while not converged do 



proxy: n = A r 



identify: X^+i = S{V{h\ 27^), 2K) 
merge: X^+i = S{x^, K) U X^+i 

update: 5 = arg min^, ||t/ — A2II2 s.t. z^TZ{^x^+^) 
a;^+i = V{x, K) 

e = £ + i 



end while 
output: X = X 



to denote the output produced by Algorithm 2 when the arguments {A, y, K) are provided as 
input, and we use the notation S = BBCoSaMP(A*, /, y, if) to denote the output produced by 
Algorithm 2 when the arguments {A'if , I,y, K) are provided as input. 

3.2.3 "Model-based" recovery guarantees 

Theoretical guarantees for standard CS recovery algorithms typically rely on the RIP, and since in 
the standard case any S-sparse signal is possible, there is little room for improvement. However, 
the block-sparse model actually rules out a large number of possible signal supports, and so we 
no longer require the full RIP or 'J'-RIP, i.e., we no longer need (8) or (9) to hold for all possible 
S-sparse signals. Instead we only require that (8) or (9) hold for all a which are i^T-block-sparse. 
We will refer to these relaxed properties as the block-RIP and St'-block-RIP respectively. 

The relaxation to block-sparse signals allows us to potentially dramatically reduce the required 
number of measurements. Specifically, note that a if-block-sparse vector a satisfies ||q;||o < 
In the standard sparsity model we would have that the number of possible subspaces is (^) with 
S = Kj-, whereas now the number of possible subspaces is given by ['^), which can be potentially 
much smaller. Establishing (8) or (9) for a more general union of subspaces is a problem that has 
received some attention in the CS literature [7,32,45]. In our context it should be clear that we 
can simply apply Lemma 3.1 just as in the proofs of Lemma 3.2 and Corollary 3.1 to obtain the 
following improved bounds. 

Corollary 3.2. Let * be an orthonormal basis for and fix S,j3 € (0, 1). Let A be an M x N 
random matrix with i.i.d. entries chosen from a distribution satisfying (11). If 



with e denoting the base of the natural logarithm, then with probability exceeding 1 — /3, A^ will 
satisfy the block-RIP of order K with constant 5. 

Corollary 3.3. Let * be an arbitrary N x D matrix and fix 6,(3 G (0, 1). Let A be an M x N 
random matrix with i.i.d. entries chosen from a distribution satisfying (11). If 



M > 



2K log(42/,5) + \og{eJ/K)) + log(4/;3) 



(24) 



M > 



2K log(42/<5) + \og{eJ/K)) + log(4//3) 
ci(VV2) 



(25) 
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with e denoting the base of the natural logarithm, then with probability exceeding 1 — P, A will 
satisfy the ^ -block-RIP of order K with constant 5. 

The measurement requirements in (24) and (25) represent improvements over (14) or (15) in 
that a straightforward apphcation of (14) or (15) would lead to replacing the \og{eJ/K) term above 
with =jlog(eJ/iC) or ^log(eJ/K) respectively. 

We can combine these corollaries with the following theorems to show that we can stably 
recover block-sparse signals using fewer measurements. Note that the following theorems are sim- 
plified guarantees for the case of only approximately block-sparse signals. Both theorems can be 
generalized to block-compressible signals,® but we restrict our attention to the guarantees for sparse 
signals to simplify discussion. 

Theorem 3.1 (Theorem 2 from [5]). Suppose that x can be written as x = ^cx where a is K- 
block-sparse and that we observe y = A^oc + e = Ax + e. If A satisfies the ^-block-RIP of 
order 2K with constant S-zk < 0.2 and ^ € [1 + 52K-, 1.5(1 — 52k)), then the estimate obtained from 
block-based IHT (23) satisfies 

\\x — x\\2 < Hi ||e||2 (26) 
where ni> 1 is a constant determined by 62K and the stopping criterion. 

Theorem 3.2 (Theorem 6 from [3]). Suppose that a is K -block- sparse and that we observe y = 
A^a e. //A* satisfies the block-RIP of order AK with constant 64K < 0.1, then the output of 
block-based CoSaMP (Algorithm 2) with a = BBCoSaMP(A*, /, y, ivT) satisfies 

||q; — a||2 < ||e||2 (27) 

where K2 > 1 is a constant determined by 64K and the stopping criterion. 

Theorems 3.1 and 3.2 show that measurement noise has a controlled impact on the amount 
of noise in the reconstruction. However, note that Theorems 3.1 and 3.2 are fundamentally dif- 
ferent from one another when the matrix ^ is not an orthonormal basis. Theorem 3.2 requires 
the assumption that A^ satisfies the block-RIP while Theorem 3.1 requires that A satisfies the 
*-block-RIP. Theorem 3.2 also provides a different guarantee (recovery of a) compared to Theo- 
rem 3.1 (which guarantees recovery of a;). In the case where ^ is an orthonormal basis, we could 
immediately set x = so that the recovery guarantee in (27) applies to the error in x as well. 
However, for arbitrary dictionaries this equivalence no longer holds. ^ We conjecture that were 
we to instead consider x = BBCoSaMP(A, y, K) we should be able to establish a theorem for 
block-based CoSaMP analogous to Theorem 3.1, but we leave such an analysis for future work. 

In Section 4 we develop a multiband modulated DPSS dictionary ^ designed to offer high- 
quality block-sparse representations for sampled multiband signals. Using this dictionary we estab- 
lish block-RIP and ^-block-RIP guarantees in Section 5, which allows us to translate Theorems 3.1 
and 3.2 into guarantees for recovery of sampled multiband signals from compressive measurements. 
When we then turn to implement these algorithms, however, it is important to note that, although 
we can implement the BBCoSaMP(A*, /, y, iC) version of block-based CoSaMP with no trouble, 

®By block-compressible, we mean signals that axe well-approximated by a block-sparse signal. The guarantee on 
the recovery error for block-compressible signals is similar to those in Theorems 3.1 and 3.2 but includes an additional 

additive component that quantifies the error incurred by approximating a with a block-sparse signal. If a signal is 
close to being block-sparse, then this error is negligible, but if a signal is not block-sparse at all, then this error can 
potentially be large. 

'^It is also worth noting that in the context of traditional (as opposed to model-based) CS, there do exist guarantees 
on II S — a; II 2 for general ^' when using an alternative optimization-based approach combined with the ^-RIP [12]. 
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there is an important caveat to tlic results for block-based IHT and the BBCoSaMP(A, i^) 
version of block-based CoSaMP. Specifically, both of those latter algorithms require that we be able 
to compute 'P{x,K) as defined in (22). Unfortunately, because our dictionary * is not orthonor- 
mal we are aware of no algorithms that are guaranteed to solve this problem. However, we will 
see empirically in Section 6 that we can attempt to solve this problem by applying many of the 
same algorithms commonly used for CS recovery. In other words, we can perfectly implement an 
algorithm (BBCoSaMP(A^', /, y, K)) that has a provable guarantee on the recovery of a, and we 
can approximately implement algorithms (block-based IHT and BBCoSaMP(A, ^,y,K)) that are 
intended to accurately recover x. We have implemented both variations of block-based CoSaMP, 
and while both perform well in practice, the empirical performance of BBCoSaMP(A, y, ii') 
turns out to be superior. In Section 6 we present a suite of simulations demonstrating the re- 
markable effectiveness of BBCoSaMP(A, y, K) (when combined with the multiband modulated 
DPSS dictionary) in recovering sampled multiband signals from compressive measurements. 

4 A Sparse Dictionary for Sampled Multiband Signals 
4.1 Analog signal models 

We now confront the problem of designing a dictionary ^ in which a;, a length- A/^ vector of Nyquist- 
rate samples of x(t), will be sparse or compressible. This is typically a significant challenge to 
anyone applying CS techniques to analog signals, since many natural analog signal models cannot 
be obviously represented via a simple basis ^ . We now describe two of the most appealing analog 
signal models and discuss the degree to which these models can fit within the CS framework. 

4.1.1 Multitone signals 

There are a variety of possibilities for quantifying the notion of sparsity in a continuous-time signal 
x{t). Perhaps the simplest, dating back at least to the work of Prony [56], is the multitone model, 
which assumes that x{t) can be expressed as 

s-i 

x{t) = Y,ake^^^^>'\ (28) 

fe=0 

i.e., x{t) is simply a weighted combination of S complex exponentials of arbitrary frequencies. A 
related model is given by 

7V-1 

x{t) = aHe^2-("-^/2+i)*/^^=, (29) 

n=0 

where ||q;||o = S. This model is considered in [73], which provided one of the first extensions of 
the CS framework to the case of continuous-time signals. The advantage of this model is that (29) 
implies that x = ^a, where x is the vector of discrete-time samples of .T(t) on [0,T„) and ^ 
is the non-normalized N x N DFT matrix with suitably ordered columns. Thus, the model (29) 
immediately fits into the standard CS framework when the vector of coefficients a is sparse. 

In practice, however, this approach is inadequate for two main reasons. First, the model in (29) 
assumes that any tones in the signal have a frequency that lies exactly on the so-called Nyquist- 
grid, i.e., the tones are bounded harmonics. When dealing with tones of arbitrary frequencies, the 
corresponding "DFT leakage" results in ot that are not sparse and are not even well-approximated 
as sparse. In this case it can be useful to either incorporate a smooth windowing function into 
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the measurement system, as in [73], or to consider the less restrictive model in (28), as in [30]. 
However, these approaches do not address the second main objection, which is that many (if not 
most) real-world signals are not mere combinations of a few pure tones. For a variety of reasons, it 
is typically more realistic to assume that each of the S signal components has some non-negligible 
bandwidth, which leads us to instead consider the following extension of the multitone model. 



4.1.2 Multiband signals 

For the remainder of this paper, we will focus on multiband signals, or signals whose Fourier 
transform is concentrated on a small number of continuous intervals or bands. Towards this end, 
for a general continuous-time signal x{t), we define T{x) as the support of X{F), i.e., J-'{x) = {F e 
M : X{F) / 0}. We will be interested in signals for which we can decompose = T{x) into a 
union of K continuous intervals, so that we can write 

K-l 
i=0 

In the most general setting, we would allow the endpoints of each interval to be arbitrary but subject 
to a restriction on the total Lebesgue measure of their union. See, for example, [8,34,35,75]. In 
this paper we restrict ourselves to a simpler model. Specifically, we divide the range of possible 
frequencies from to into J = ^"^^^ equal bands of width i?band and require X{F) to be 

supported on at most K of these bands. An example is illustrated in Figure 1. More formally, we 
define the i*^ band as 



A,: 



— h ^i3band) 7^ \- [.^ + J-j-Dband 



We then define 



r{K, Bband) = |.F : C [J Aj for some X C {0, 1, . . . , J - 1} with jXj < 

as the set of all possible supports. Using this, we define 

M{K, Sband) = {x{t) : T{x) C F' for some F' G T{K, 5ba„d)} (30) 

as the set of multiband signals. Note that the total occupied bandwidth is at most iCSband- Our 
interest is in the setting where KSband -^nyq; so that if we knew a priori where the K bands 
were located, we could acquire x(t) in a streaming setting with only i^Bband samples per second. 
(This is the so-called Landau rate [41].) Our goal is to acquire finite windows of multiband signals 
without such a priori knowledge while keeping the measurement rate as close as possible to the 
Landau rate. 

Note, however, that the set M.{K, -Bband) is defined for infinite-length signals x{t). Indeed, any 
signal with a Fourier transform supported on a finite range of frequencies cannot also be supported 
on a finite range of time. This would seem to be somewhat at odds with the finite-dimensional CS 
framework described above. As a result, previous efforts aimed at sampling multiband signals have 
developed largely outside the framework of CS [8,34,35,47,48,75]. It is our goal in this paper to 
show that it is possible to recover a finite-length window of samples of a multiband signal using 
many of the standard tools from CS. To do this, we will need to construct an appropriate dictionary 
^ for capturing the structure of this set, which we do in the following section. 
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Figure 1: Illustration showing the CTFT of a nmltiband signal where J — g"^" — 10 and K = 3. 



Finally, we also note that while our signal model breaks up the spectrum into J = „ "^'^ bands 
with fixed boundaries and bandwidth, it actually encompasses the broader class of signals where 
the bandwidth and center frequency of each band are arbitrary. For example, a signal with K bands 
of width -Bband but with arbitrary center frequencies will automatically lie within A^(2X, i?band)- 
Since we are primarily interested in the case where -fC-Bband ^ -^nyq, this factor of 2 will not be 
significant in the development of our theoretical results. However, we do note that in practice it 
may be possible to achieve a significant gain by exploiting a more flexible signal model. 

In the remainder of this section we demonstrate that it is possible to construct discrete-time 
bases using Discrete Prolate Spheroidal Sequences (DPSS's) that efficiently capture the structure 
of sampled multiband signals. We first review DPSS's and their key properties as first developed 
in [65] , and we then discuss some of the consequences of these properties in terms of their utility 
in representing sampled continuous-time signals. Ultimately, we demonstrate how to use DPSS's 
to construct a dictionary ^ which sparsely represents windows of sampled multiband signals. 



4.2 Discrete Prolate Spheroidal Sequences (DPSS's) 

Our goal in this subsection is to provide a self-contained review of the concepts from Slepian's 
work on DPSS's [65] that are most relevant to the problem of modeling sampled multiband signals. 
We refer the reader to [42,43,63-67] for more complete overviews of DPSS's, PSWF's, and their 
implications in time-frequency localization. 



4.2.1 DPSS's 

Let be an integer, and let < < 5. Given N and W, the DPSS's are a collection of N 
discrete-time sequences that are strictly bandlimited to the digital frequency range |/| < yet 
highly concentrated in time to the index range n = 0, 1, . . . , — 1. The DPSS's are defined to 
be the eigenvectors of a two-step procedure in which one first time-limits the sequence and then 
bandlimits the sequence. Before we can state a more formal definition, let us note that for a given 
discrete-time signal x[n], we let 

00 

X{f) = ^ x[n]e-^-2-/" 

n=— 00 
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denote the discrete-time Fourier transform (DTFT) of x[n].^ Next, we let Bw denote an operator 
that takes a discrete-time signal, bandlimits its DTFT to the frequency range |/| < W, and 
returns the corresponding signal in the time domain. Additionally, we let T/v denote an operator 
that takes an infinite-length discrete-time signal and zeros out all entries outside the index range 
{0, 1, . . . , iV — 1} (but still returns an infinite-length signal). With these definitions, the DPSS's 
are defined as follows. 



Definition 4.1. [65] Given N and W , the Discrete Prolate Spheroidal Sequences (DPSS's) are 

a collection of N real-valued dis 
corresponding scalar eigenvalues 



a collection of N real-valued discrete-time sequences s^nw^^n^w^-'-i^n^vv^^ that, along with the 



satisfy 

for all £ e {0,1,. .. ,N — 1}. The DPSS's are normalized so that 



Bw{Tn{s^S.w)) = ^N.w4]w (31) 



|rAr(4V)||2 = l (32) 



for allie{0,l,...,N-l}. 



As we discuss in more detail below in Section 4.2.4, the eigenvalues , ^n^w ^ ■ ■ ■ ■> ^N^w^^ 
have a very distinctive behavior: the first 2NW eigenvalues tend to cluster extremely close to 1, 
while the remaining eigenvalues tend to cluster similarly close to 0. 

Before proceeding, let us briefly mention several key properties of the DPSS's that will be useful 
in our subsequent analysis. First, it is clear from (31) that the DPSS's are, by definition, strictly 
bandlimited to the digital frequency range |/| < W. Second, the DPSS's are also approximately 
time- limited to the index range n = 0, l,...,Ar — 1. Specifically, it can be shown that [65] 

^Nwh = r^^- (33) 

Comparing (32) with (33), we see that for values of i where A^^-*^ 1, nearly all of the energy in 
4w contained in the interval {0, 1, . . . , AT — 1}. Third, the DPSS's are orthogonal over Z [65], 
i.e., for any £,£' e {0,1, . . . , N - 1} with £ ^ £' , (^J^, 4'}^) = 0. 



4.2.2 Time-limited DPSS's 

While each DPSS actually has infinite support in time, several very useful properties hold for the 
collection of signals one obtains by time-limiting the DPSS's to the index range n = 0, 1, . . . , A — 1. 

First, the time-limited DPSS's T/v(s7vV)' ^('^TvV)' ■ ■ • ^'^^i^^NW^) approximately bandlimited 
to the digital frequency range j/j < W. Specifically, from (31) and (33), one can deduce that 

\\Bw{TN{s'-i]^m2 = ^f>^. (34) 

®Note that we use lower-case / to indicate the digital frequency (so that X{f) represents the DTFT of a discrete- 
time sequence x[n], while X{F) represents the CTFT of a continuous-time signal x{t)). 
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£ = £^127 £ = 511 £ = 767 



Figure 2: Illustration of four example DPSS's time-limited to the interval [0, 1, . . . , iV— 1] and the magnitude 
of their DTFT's. In this example, N = 1024 and W = \. Note that for £ up to approximately 2NW - 1 
the energy of the spectrum is highly concentrated on the interval [—W, W], and when £ is sufEciently larger 
than 2NW — 1 the energy of the spectrum is concentrated almost entirely outside the interval [—W, W]. 

Comparing (32) with (34), we see that for values of i where X^pf^r ~ 1, nearly all of the energy 

in T/v(s7v''vv') contained in the frequencies |/| < W. An illustration of four representative time- 
limited DPSS's and their DTFT's is provided in Figure 2. While by construction the DTFT of 
any DPSS is perfectly bandlimited, the DTFT of the corresponding time-limited DPSS will only be 
concentrated in the bandwidth of interest for the first 2NW DPSS's. As a result, we will frequently 
be primarily interested in roughly the first 2NW DPSS's. 

Second, the time-limited DPSS's are also orthogonal [65] so that for any £,£' £ {0,1, . . . ,N — 1} 
with £y^£', 

{Tn{s^S^w)^T'n{s^nw))=^- (35) 
Finally, like the DPSS's, the time-limited DPSS's have a special eigenvalue relationship with the 
time-limiting and bandlimiting operators. In particular, if we apply the operator T/v to both sides 
of (31), we see that the sequences T/v(-S7v^pi/) are actually eigenfunctions of the two-step procedure 
in which one first bandlimits a sequence and then time-limits the sequence. 

4.2.3 DPSS vectors 

Because our focus in this paper is primarily on representing and reconstructing finite-length vectors, 
we will find the following restriction of the time-limited DPSS's to be useful, where we restrict the 
domain exclusively to the index range n = 0, 1, . . . , — 1 (discarding the zeros outside this range). 

Definition 4.2. Given N and W , the DPSS vectors sj^'*^, ^^nw^ ■ • • > ^TvW^^ ^ '^'"'^ defined by 
restricting the time-limited DPSS's to the index range n = 0,1, . . . , N — 1: 

for all i,n€{0,l,...,N -1}. 
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Figure 3: Eigenvalue concentration for N = 1024 and W — \. Note that the first 2NW — 512 eigenvalues 
cluster near 1 and the remaining eigenvalues rapidly approach the level of machine precision. 



Following from our discussion in Section 4.2.2, the DPSS vectors obey several favorable prop- 
erties. First, combining (32) and (35), it follows that the DPSS vectors form an orthonormal basis 
for (or for M^). However, as we discuss in subsequent sections, bases constructed using just 
the first ~ 2NW DPSS vectors can be remarkably effective for capturing the energy in our signals 
of interest. Second, if we define -B^viy to be the N x N matrix with entries given by 



BN,w['m,n] := 21^ sine {2W{m — n)) , 



(36) 



we see that the eigenvectors of B]\i^y/ are given by the DPSS vectors sf^^^, ^tvV' • • • > ^^NW^^ ^'^^ 



A 



(7V-1) 



the corresponding eigenvalues are Ajy ,y , . . . , 

vectors into an N x N matrix 



[65]. Thus, if we concatenate the DPSS 



>N,W '■- 



^{N-1) 



iNxN 



A 



(37) 



(TV-l) 



and let A^^w denote an N x N diagonal matrix with the DPSS eigenvalues A]^ , . . . , ^ 

along the main diagonal, then we can write the eigendecomposition of Bn^w as 



Bn,w = Sn,w-An,wS^^\y. 
This decomposition will prove useful in our analysis below. 



(38) 



4.2.4 Eigenvalue concentration 



As mentioned above, the eigenvalues A^^^^^, A^^^^, . . . , 



X^A^TxP have a very distinctive and important 



behavior: the first 2NW eigenvalues tend to cluster extremely close to 1, while the remaining 
eigenvalues tend to cluster similarly close to 0. This behavior — which will allow us to construct 
efficient bases using small numbers of DPSS vectors — is illustrated in Figure 3 and captured more 
formally in the following results. 

Lemma 4.1 (Eigenvalues that cluster near one [65]). Suppose that W is fixed, and let e G (0,1) 
be fixed. Then there exist constants Ci,C2 (where C2 rnay depend on W and e) and an integer Nq 
(which may also depend on W and e) such that 



a(^) > 1 



Cie 



-C2N 



for ah e < 2NW{1 - e) and ah > A'o- 



(39) 
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Lemma 4.2 (Eigenvalues that cluster near zero [65]). Suppose that W is fixed, and let e G (0, — 

1) he fixed. Then there exist constants C^, C/^ (where C4 may depend on W and e) and an integer 
Ni (which may also depend on W and e) such that 

Ajiy < Cge-^^^ for all i > 2NW{1 + e) and all N > Ni. (40) 

Alternatively, suppose that W is fixed, and let a > be fixed. Then there exist constants C^,Cq 
and an integer N2 ( where N2 may depend on W and a ) such that 

aSJV ^ C'se"^^" for all I > 2NW + a\og{N) and all N > N2. 



On occasion, we will have a need to compute bounds on sums of the eigenvalues. First, we note 
the following. 

Lemma 4.3. 

N-l 

>^%w = tTace{BN,w) = 2NW. (41) 

The following results will also prove useful. 
Corollary 4.1. Suppose that W is fixed, let e G (0, 1), and let k = 2NW{1 - e). Then for N > Nq, 

N-l 



^A^;^<2W(6 + Cie-^^^), 



e=k 

where Ci, C2, and Nq are as specified in Lemma 4-1- 
Proof. It follows from (39) and (41) that 

N-l k-1 

e=k e=o 

< 2NW - 2NW{1 - e) (1 - Cie-^^^) 

= 2NW (1 - (1 - e) (1 - 

= 2NW (e + Cie-^2iv _ ^Cie-^^^) 

< 2NW (e + Cie-^^^) (42) 

for N >No. □ 

Corollary 4.2. Suppose that W is fixed, let e € (0, 2^ — 1), and let k = 2NW{1 + e). Then for 
for N > max(A^o,^i), 

X; Ajy < 2Wmin (e + C.e'^^^ W''''''') ' ^''^ 
e=k ^ ^ 

where C\, C2, C3, C4, Nq, and Ni are as specified in Lemma 4-1 and Lemma 4-^- 

Proof. This result follows simply from Corollary 4.1 and from (40). □ 
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4.3 DPSS bases for sampled bandpass signals 



Using the DPSS vectors, it is possible to construct remarkably efficient bases for representing 

the discrete-time vectors that arise when collecting finite numbers of samples from most multiband 
signals (e.g., signals in M.{K, -Bband))- Before presenting our full construction, however, we illustrate 
the basic concepts using sampled bandpass signals (e.g., signals in (1, .Bband))- 



4.3.1 A bandpass modulated DPSS basis 

For the moment, we restrict our attention to vectors of samples acquired from a continuous-time 

bandpass signal x{t). We assume that J-{x), the support of X{F), is restricted to some interval 
[Fc — -^band ^ By^^nd j ^ -^^j^gj-g ^^jg ccutcr frcqucucy Fc and width i?band are known. Wc define 
X = [x{0) x{Ts) ■ ■ ■ x({N — l)Ts)]'^ as in Section 3 to be a finite-length vector of samples of x{t) 
collected uniformly with a sampling interval Tg < l/(2max{|Fc it ^^^f^l}). 

As a basis for efficiently representing many such vectors x, we propose the following. First, let 
W = :5baadls^ j^j^j (37), Ict S N ,w dcuotc the N x N matrix containing the N DPSS vectors 

(constructed with parameters N and W) as columns. Next, define fc = FcTs and let Ej^ denote 
an N X N diagonal matrix with entries 

Ef [m,n] :=\ ^ , m - n, 

Multiplying a vector by Ef^ simply modulates that vector by a frequency fc- Finally, consider the 
NxN matrix Ej^S^yVj whose columns are given by the DPSS vectors, each modulated by fc- One 
can easily check that Ef^S^^w forms an orthonormal basis for C^^, since {E f^S n^w)^ E f^S n,w = 
{SN,w)"{EfJ"Ef^SN,w = {Sn,w)"Sn,w = I For a given integer A; G {1, 2, . . .' , N}, we let 

[Ef^SN,w]k 

denote the N x k matrix formed by taking the first k columns of Ej^Sn^w- We will see that taking 
[E f^SN,w]i; with k 2NW forms an efficient basis that, to a high degree of accuracy, captures 
most sample vectors x that can arise from sampling bandpass signals. 



4.3.2 Approximation quality for sampled complex exponentials 



To best illustrate one of our key points regarding the approximation of bandpass signals, let us first 
restrict our focus to the simplest possible bandpass signals: pure complex exponentials. Specifically, 
consider a continuous-time signal of the form x(t) = e^^'^^^ where the frequency F belongs to the 
interval [Fc - %^,Fc + %^]. We define x = [x{0) x{%) ■ ■ ■ x{{N - l)Ts)f, fc = FcT^, and 



W = 



as in Section 4.3.1. Also, defining 



ej27r/0 



pi27r/(Af-l) 



for all / G [/c — W, fc + W], we note that the sample vector x will equal e/ for / = FTg- Without 
knowing the exact carrier frequency F in advance, we ask whether it is possible to define an efficient 
low-dimensional basis for capturing the energy in any sample vector x that could arise in this model. 
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At first glance, this problem may appear difficult or impossible. The infinite set of possible 
sample vectors {^f} fe[fc-W,fc+W] traverses a 1-dimensional manifold (parameterized by /) within 
C''^. Technically speaking, these vectors collectively span all of C^. What is remarkable, however, 
is that to a high degree of accuracy, these vectors are approximately concentrated along a very 
low-dimensional subspace of C^. Moreover, for any A; G {1, 2, ... , N}, it is possible to analytically 
find the best fe-dimensional subspace that minimizes the average squared distance from the vectors 

to the subspace, and this subspace is spanned by the first k modulated DPSS vectors. 

To see this, we let Q denote a subspace of C''^ and Pq denote the orthogonal projection operator 
onto Q. We would like to minimize 

^ / ^ WepT. - PqsftAI dF= — . lie/ - Pge/Hl df (45) 

over all subspaces Q of some prescribed dimension k. As we show below in Theorem 4.1, this 
minimization problem can be solved by relating it to the Karhunen-Loeve (KL) transform.^ For the 
benefit of the reader, we briefly review the relevant concepts from the KL transform in Appendix A. 

Theorem 4.1. For any k with k G {1, 2, . . . , N}, the k- dimensional subspace which minimizes (45) 
is spanned by the columns of Q = [E f^S N,w]k7 i-^-, by the first k DPSS vectors modulated to center 
frequency fc- Furthermore, with this choice of Q, we will have 

For a point of comparison, each sampled sinusoid has energy ||ej||2 = iV. 

Proof. See Appendix B. □ 

Recall that the first ~ 2NW DPSS eigenvalues are very close to 1 and the rest are small, so we 
are guaranteed a high degree of approximation to the sampled sinusoids if we choose k 2NW. 
In particular, if we choose k = 2NW{1 — e), it follows from Corollary 4.1 that 

N-l 

f-k 

for N > Nq. Alternatively, if we choose k = 2NW{1 + e), it follows from Corollary 4.2 that 

— V A^^) < Nmm(f4-r,p-'^^^ ^3 ^-C^n\ 
2w2^^^'W- V^^^"^ ' 2W I 

e=k ^ ^ 

for > max(A'o) A^i)- Since each sampled sinusoid has energy ||e/||2 = A*", for the subspace we have 
chosen (say with k = 2NW{1 — e)), the average relative approximation error across the sampled 
sinusoids is bounded by a small fraction e + Cie~^'^^ of the energy of a given sinusoid. 

It is important to note that, while we are guaranteed a very high degree of approximation 
accuracy in an MSE sense, we are not guaranteed such accuracy uniformly over all sampled sinusoids 
in our band of interest. A relatively small number of sinusoids may have higher values of ||e/ — 
PQe/|||, and in practice this diminished approximation performance tends to occur for those 
sinusoids with frequencies near the edge of the band (i.e., for / near fc ± W). 

^Thc observation that a connection exists between DPSS's and the KL transform is not a new one (see, for 
example, [33,39,49,58]). 
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Figure 4: DPSS bases efficiently capture the energy in sampled complex exponentials, (a) SNR captured 
from three sampled complex exponentials (with differing frequencies f) as a function of the number k of 
DPSS basis elements, (b) SNR captured as a function of f for four fixed values of k. In both plots, N = 1024 



and W =1. 



This behavior is illustrated in Figure 4. In Figure 4(a) we consider three possible frequencies 
(/ = 0, |, or ^) and show the ability of the baseband DPSS basis (with /c = and W = ^) to 
capture the energy in these sinusoids as a function of how many DPSS vectors are added to the 
basis. The ability of a basis to capture a given signal is quantified through 

SNR = 201ogio (j. """f,"' II ) dB. 

Overall, we observe broadly similar behavior for each frequency in that by adding slightly more 
than 2A^1^ DPSS vectors to our basis, we capture essentially all of the energy in the original signal. 
However, we do observe slightly different behavior for the sinusoid with a frequency exactly at W. 
In this case we capture very little of the energy in the signal until we have added more than 2NW 
vectors, while for lower frequencies we begin to do well before this point. 

To illustrate this phenomenon. Figure 4(b) considers four different sizes for the DPSS basis and 
shows the SNR as a function of the frequency of the sinusoid. In the cases where k = 500 < 2NW 
and k = 2NW we see somewhat similar behavior — we are capturing a good fraction, but not all, of 
any sinusoid whose frequency is not too close to W. We see a dramatic difference when we increase 
k slightly to 560, at which point we are capturing virtually all of the energy in any sinusoid within 
the band of interest. However, this eventually has a potentially problematic side-effect, as we can 
see by further increasing k to 700. Specifically, as we continue to increase the size of the DPSS basis 
we begin to capture energy of sinusoids that lie outside the targeted band as well. This tradeoff will 
play an important role in the selection of the appropriate k in the algorithms we propose below. 



4.3.3 Approximation quality for sampled bandpass signals 

The above analysis shows that sampled sinusoids, on average, are well- approximated by modulated 
DPSS bases. This is a strong indication that such bases might also be useful for approximating 
more general sampled bandpass signals, since the vectors {ej} f(z[f^-wjc+W] <^ themselves act as 
"building blocks" for representing sampled, bandpass signals in C^. Formally, for any continuous- 
time bandpass signal x{t) with frequency content restricted to the interval [Fc — -^band ^ _|_ -^band j ^ 
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one can show that for each n G {0, 1, . . . , iV — 1}, 



x[n] := x{n%) = / X{F)e^^-^^^^ dF = / X(/)e/[n] df, 

JFc-^i^ Jfc-W 

where we recall that X{F) denotes the CTFT of x{t) and X{f) denotes the DTFT of x[n]. So, 
informally, x can be expressed as a linear combination of (infinitely many) sampled complex expo- 
nentials Bf, where / ranges from fc — W to fc + W. 

Our analysis from Section 4.3.2 allows us to show that in a certain sense, most continuous-time 
bandpass signals, when sampled and time-limited, are well- approximated by the modulated DPSS 
basis. In particular, the following result establishes that bandpass random processes, when sampled 
and time-limited, are in expectation well-approximated. 

Theorem 4.2. Letx(t) denote a continuous-time, zero-mean, wide sense stationary random process 
with power spectrum 



[ 0, otherwise. 



_ ^band P I -Bband 1 



and let x = [x(0) x{Ts) ■ ■ ■ x{{N — l)Ts)]-^ € denote a finite vector of samples acquired from 
x{t) with a sampling interval Tg < 1/(2 max{|Fc =b ^^^^l}). Then over all k-dimensional subspaces 
of , X is best approximated (in an MSB sense) by the subspace spanned by the columns of 
Q = [Ef^SN,w]k, where fc = FcTg and W = The corresponding MSB is given by 

N-l 

^[\\x-PQxg] = — J2^N,w^ (46) 

e=k 

while E [\\x\\l] = N. 

Proof. See Appendix C. □ 

As in our discussion following Theorem 4.1, we can ensure that the MSE in (46) is small 
compared to E [||£c|||] by choosing k ^ 2NW. This suggests that in a probabilistic sense, most 
bandpass signals, when sampled and time-limited, will be well-approximated by a small number of 
modulated DPSS vectors. Again, however, we are not guaranteed such accuracy uniformly over all 
sampled bandpass signals in our band of interest. A relatively small number of bandpass signals x{t) 
could lead to sample vectors x with higher values of ||a; — PgajlH. In particular, recalling that the 
baseband DPSS's are themselves strictly bandlimited, it follows that there exist strictly bandpass 
signals that when sampled and time-limited yield the modulated DPSS vectors. If we restrict Q to 
the first k columns of Ef^Sis[,w, then any bandpass signal producing a sample vector x = Ef^Sp/^^ 
with £> k will have Pqx = and \\x — Pqx\\2 = \\x\\2- Fortunately, Theorem 4.2 confirms that 
such bandpass signals are relatively uncommon: at the risk of belaboring this important point, 
most bandpass signals, when sampled and time-limited, produce sample vectors approximately in 
the span of just the first k « 2NW modulated DPSS vectors. 

On a related note, signal processing engineers often have a sense for how much "spectral leakage" 
to anticipate when collecting a finite window of samples of a continuous-time signal. (Frequently, 
this leakage is reduced via a smooth windowing function [46].) Such practitioners can rest assured 
that, in every case where the spectral leakage is small outside a bandpass range of frequencies, the 
resulting sample vector can be well- approximated by a small number of modulated DPSS vectors. 
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Theorem 4.3. Let x[n] = 71v(a^M) be a time-limited sequence, and suppose that x[n] is approxi- 
mately bandlimited to the frequency range / € [/c — W, fc + W] such that for some S, 

\\Bf^,wx\\l>{l-S)\\x\\l 

where Bf^^w denotes an orthogonal projection operator that takes a discrete-time signal, bandlimits 
its DTFT to the frequency range / G [fc — W, fc + W], and returns the corresponding signal in 
the time domain. Let x G C'^ denote the vector formed, by restricting x[n] to the indices n = 
0,1,..., N -1. Setk = 2NW{1 + e) and let Q = [Ef^SN,w]k- ^^^^ f^"^ ^ ^ ^i' 

||a; - Pqx\\1 < (<5 + iVC73e-^^^)||a;||i, (47) 

where C3, C4, and Ni are as specified in Lemma 4-2. 

Proof. See Appendix D. □ 
Note that, in contrast to Theorem 4.2, Theorem 4.3 requires k to be shghtly larger than 2NW. 



4.4 DPSS dictionaries for sampled multiband signals 

4.4.1 A multiband modulated DPSS dictionary 

In order to construct an efficient dictionary for sampled multiband signals, we propose simply to 
concatenate an ensemble of modulated DPSS bases, each one modulated to the center of a band in 
our model. In particular, in light of the multiband signal model discussed in Section 4.1.2, where 
the bandwidth [— ^%^] is partitioned into bands Aj of size -Bband) let us define the midpoint 
of Aj as 

i^i = -^+ (^ + 05band, ie{0,l,...,J-l}, 

where J = Let W = -^^=^^^7; assume a sampling interval Tg < 5^)) and for each i, let 

fi = FiTs and define 

*i = [Ef,Sj,,w]k (48) 

for some value of k that wc can choose as desired. We construct the multiband modulated DPSS 
dictionary * by concatenating all of the ^f. 

* = [*o *i • • • *J-i] • (49) 

The matrix ^ will have size N xkJ (note that if = 2NW and Tg = * will be square). 

4.4.2 Approximation quality for sampled multiband signals 

In a probabilistic sense, most multiband signals, when sampled and time-limited, will be well- 
approximated by a small number of vectors from the multiband modulated DPSS dictionary. In 
particular, there exists a block-sparse approximation for such sample vectors using only the modu- 
lated DPSS vectors corresponding to the active signal bands. 

Theorem 4.4. Let X C {0, 1, . . . , J — 1} with \I\ = K. Suppose that for each i G I, Xi{t) is a 
continuous-time, zero-mean, wide sense stationary random process with power spectrum 



KB, 



band 



0, otherwise. 
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and furthermore suppose the {xi}i£x o,re independent and jointly wide sense stationary. Let x{t) = 
^^gj and let x = [x{0) x(Ts) ■■■ x{{N — l)Ts)]"^ € denote a finite vector of samples 
acquired from x{t) with a sampling interval of < -jf—- Let denote the concatenation of the 
over all i el, where the *i are as defined in (48) . Then 

E[||a.-P*,x||i] < — ^Ajy, (50) 

whereas E [||a;|||] = AT. 

Proof. See Appendix E. □ 

Theorem 4.4 confirms the existence of a high-quahty block-sparse approximation for most sam- 
pled multiband signals; in particular, the signal approximation vector specified in (50) can be writ- 
ten as Pqf^x = ^oc for a K-block-sparse coefficient vector a given by q:|j = and a\ic = 0. 
As in our previous discussions for bandpass signals, we can ensure the MSE in (50) is small com- 
pared to E [||a;||2] by choosing k ~ 2NW. Compared to previous analysis, however, the MSE 
appearing in Theorem 4.4 is larger by a factor of K (though this quantity may still be quite small). 
Although it may be possible to improve upon this figure, the reader should keep in mind that the 
multiband modulated DPSS dictionary is not necessarily the optimal basis for representing sampled 
multiband signals with a given sparsity pattern I; it is merely a generic (and easily computable) 
dictionary that provides highly accurate approximations for most multiband signals having any 
possible sparsity pattern. 

Although we omit the details here, one could also consider generalizing Theorem 4.3 to multi- 
band signals that have small spectral leakage when windowed in time. 

5 Recovering Sampled Multiband Signals from Random Measure- 
ments 

In this section, we proceed to develop theoretical guarantees for signal recovery using the multiband 
modulated DPSS dictionary ^ as defined in (49). Throughout our theoretical discussion and the 
subsequent experiments, we pay special attention to the role played the number k of DPSS vectors 
per band. We begin in Section 5.1 with a collection of RIP guarantees, and we extend these to 
signal recovery guarantees in Section 5.2. Throughout this section and the remainder of the paper, 
we assume that Tg < . 

JJnyq 

5.1 Embedding guarantees 

We can actually immediately establish 'J'-RIP and 'I'-block-RIP guarantees for any value of k. The 
theorem below follows as a direct consequence of Corollaries 3.1 and 3.3. 

Theorem 5.1. Let k G {l,2,...,N}, set D = kJ = and let * be the N x D multiband 

modulated DPSS dictionary defined in (49) . The following statements hold: 

1. Fix S, l3 £ (0, 1), and let A be an M x N suhgaussian matrix with M satisfying (15). Then 
with probability exceeding 1 — (3, A satisfies the "if -RIP of order S with constant 6. 

2. Fix (5, /3 G (0, 1), and let A be an M x N subgaussian matrix with M satisfying (25). Then 
with probability exceeding 1 — (3, A satisfies the "if -block-RIP of order K with constant 5. 
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In order to establish RIP and bloek-RIP bounds for A^, however, we must restrict our attention 
to values of k that are not too large (this ensures the matrix ^ is not overcomplete) . To be specific, 
we note that for any k, the columns of ^ have unit norm. In addition, when k is suitably small, 
the columns of ^ are approximately orthogonal. 

Lemma 5.1. Let k = 2NW{1 - e), set D = kJ = N{1 - e)BnyqTs < N, and let ^ be the N x D 
multiband modulated DPSS dictionary defined in (49) . Then for any pair of distinct columns Qi , 
q2 in ^, 

|(Qi,Q2)I<3C^/V^ (51) 
if N > Nq, where Ci, C2, and Nq are as specified in Lemma 4-1- 

Proof See Appendix F. □ 

Using this fact, we can ensure that whenever k = 2NW{1 — e), must act as an approximate 
isometry between any coefficient vector a G and the corresponding signal vector a; = ^'ct G C^. 

Lemma 5.2. Let k = 2NW{1 - e), set D = kJ = N{1 - e)BnyqTs < A^, and let be the N x D 
multiband modulated DPSS dictionary defined in (49). Then 

v/l - SNCl/'e-"^ < ^ <Jl + SNCl/'e-"^ (52) 

II0II2 

for allaeC^. 

Proof. The sharpest possible lower and upper bounds in (52) are given by the smallest and 
largest singular values of respectively. Using standard results from linear algebra, crmin(*) = 

Y^Amin(*''^*) and (Ji„ax(*) = y'A^nax(*^^- The Gram matrix has size DxD. All entries 

on the main diagonal of are equal to 1, and all entries off of the main diagonal can be bounded 

using (51). From the Gersgorin circle theorem [36], it follows that all eigenvalues of ^^^^ must 

1/2 _^2^ 1/2 _^2^ 

fall in the interval [1 — 3Z)(7^ e 2^ , 1 + SDC^ e 2^], which for simplicity we note is contained 
within the interval [1 - 3A^Ci' e — ^, 1 + SATCi' e" 2 ] since by assumption D < N. □ 

Lemma 5.2 is the key fact we need to establish RIP and bloek-RIP bounds for A*. 

Theorem 5.2. Let k = 2NW{1 - e), set D = kJ = N{1 - e)BnyqTs < N, and let * be the N X D 

multiband modulated DPSS dictionary defined in (49). The following statements hold: 

1. If A satisfies the ^-RIP of order S with constant 5, then A^ satisfies the RIP of order S 

1/2 02 iV 

with constant 5 + 6NC1 e 2 . 

2. If A satisfies the ^-bloek-RIP of order K with constant 5, then A^ satisfies the block-RIP 

1/2 C2iV 

of order K with constant 5 + 6NC1 e 2 . 
Proof For any o: G such that (9) holds, we can use (52) to conclude that 

Il«ll2 

For the upper bound, note that 

(1 + S)(l + SNC^^e-^^ = 1 + S + SNCl^^e-^ + SSNC^^e 

< l + S + 6NCl'e — ^, 
where the second line follows from the assumption that S < 1. The lower bound follows similarly. □ 



C2N 
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5.2 Recovery gueirantees 



In this section we prove that with a sufficient number of measurements and an appropriately 
constructed multiband modulated DPSS dictionary, most sample vectors of multiband signals can 
be accurately reconstructed. Our proof of this fact relies on two principles. 



5.2.1 Recovery of exactly block-sparse signals 

The first of these principles is that, as a consequence of our RIP results in Section 5.1, signal 
vectors having representations that are exactly if-block-sparse in the dictionary * can be accu- 
rately reconstructed from compressive samples. The following two results follow from combining 
Theorems 3.1, 3.2, 5.1, and 5.2. 

Theorem 5.3. Let k € {1,2,..., N], set D = kJ = and let * be the N x D multiband 

modulated DPSS dictionary defined in (49). Fix 6 G (0, 0.2) and fi G (0, 1), and let A be an M x N 
subgaussian matrix with 

^ > log(42/(^) + log(e J/2i^)) + log(4//3) 

ci{6/V2) ■ ^ ^ 

Then with probability exceeding 1 — /3, the following statement holds: For any x' G that has a 
K -block-sparse representation in the dictionary ^ (i.e., that can be written as x' = ^a' for some 
K -block-sparse vector a' G C^), if we use block-based IHT (23) with ^ £ [1 + S, 1.5(1 — 5)) to 
recover an estimate x of x' from the observations y = Ax' + e, the resulting x will satisfy 

ll^c' — 2II2 < ||e||2 , (54) 

where ki > 1 is as specified in Theorem 3.1. 

Theorem 5.4. Let k = 2NW{1 - e), set D = kJ = N{1 - e)BnyqTs < N, and let * be the 

1/2 _^2^ 

N X D multiband modulated DPSS dictionary defined in (49). Fix (5 G (0, 0.1 — QNC-^ e 2 ) and 
fi G (0, 1). Let A be an M X N subgaussian matrix with 

^ > 8K log(42/,5) + log(eJ/4K)) + log(4//3) 

ci{S/V2) 

Then with probability exceeding 1 — P, the following statement holds: For any x' G that has 
a K -block-sparse representation in the dictionary ^ (i.e., that can be written as x' = ^a' for 
som,e K -block- sparse vector oc' G C^), if we use block-based CoSaMP (Algorithm 2) with a = 
BBCoSaMP(A^, /, y, K) to recover an estimate a of a' from the observations y = Ax' + e, the 
resulting a will satisfy 

||q:' — SII2 < ||e||2 (56) 
where K2 > 1 is as specified in Theorem 3.2. 



5.2.2 Approximating sampled multiband signals with exactly block-sparse signals 

The second of our principles is that most multiband signals with K occupied bands, when sampled 

and time-limited, have a high-quality X-block-sparsc representation in the dictionary Let 
x{t) denote a continuous-time multiband signal with nonzero support on blocks indexed by X C 
{0, 1, . . . , J - 1}, where |X| = K. Let x = [x(0) x{Ts) ■ ■ ■ x{{N - l)Ts)f G denote a finite 
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vector of samples acquired from x{t) with a sampling interval of Tg < Let ot' be a K- 

block-sparse coefficient vector given by a.'\x = and ct'\ic = 0. Defining x' := ^a' and 

Bx .= X — = X — x', we can then write 

x = x' + Bx, (57) 

where x' has an exactly i^-block-sparse representation in the dictionary ^ and we expect Bx to be 
small. 

Wc can more formally bound the size of Bx- For example, under the multiband random process 
model for x{t) described in Theorem 4.4, wc will have E [Ilea;!!!] — 2W '^e=k^ ^^NW setting the 
number of columns per band k to be on the order of 2NW, we can make this error small relative 
to E [!!a;!!|] = A^. For example, if we take k = 2NW{1 — e) for some e G (0, 1), Corollary 4.1 allows 
us to conclude that 

K ^"^ 

E [WexWl] < ^ E ^n]w < KN{e + Cie"^^^). (58) 

e=k 

for N > Nq. The rightmost upper bound in (58) can be made as small as desired (relative to 
E [lla?!!!] = N) by choosing e sufficiently small and sufficiently large. 

For any value of /c, we can also establish a tail bound on ||ea;||2, guaranteeing that it is unlikely 
to significantly exceed the quantity ^ S^Iife^ "^^nw Using standard concentration of measure 
arguments for siibgaussian and subexponential random variables (sec [76] for a thorough discussion) , 
one can make the following guarantee on the squared norm of a Gaussian random vector. 

Lemma 5.3. Let z G C'^ be a complex-valued Gaussian random vector with mean zero. Then 

P [|||.||^ - E [IWli] I > ,E [INIi]] < 2 exp {-min (^g;^. } . 

where C2 is a universal constant and {Xn} denote the eigenvalues of the autocorrelation matrix of 
the length-2N random vector [Re(z)-^ Im{z)^]^ . 

If we assume in our multiband random process model that x{t) is a Gaussian random process, 
this will imply that £C is a Gaussian random vector, and since Bx is a linear transformation of x, Bx 
will be Gaussian as well. This allows us to apply Lemma 5.3 to the quantity ||ea;|||. Using the fact 
that > IIAII2 > WMlcxi vector A, we can pessimistically simplify this bound to read: 

< 2 exp {-mm (J 1)}. (59) 



K 



N-l 



5.2.3 Combined guarantees 

To put these two principles together, we note that when taking noise-free compressive measurements 
Ax of a signal vector x obeying (57), we will have Ax = Ax' + Abx- This allows us to invoke 
Theorems 5.3 and 5.4 with e := Abx,^^ and when the number of columns per band k is chosen so 
that we expect Bx to be small, the concentration of measure phenomenon tells us that e should 

^"One could easily incorporate actual measurement noise into e as well (and thus extend Theorems 5.5 and 5.6 to 
the case of noisy measurements). However, for the sake of clarity we simply set e := Aeas in order to highlight the 
impact of modeling error in our main results. 
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be small as well. In particular, note that for fixed ex and random subgaussian A, (11) guarantees 
that 



\e\\l 



x\\2 



< 4e-^i('')^. (60) 



Then, for example, if we let let x denote the estimated signal vector recovered via block-based IHT, 
(60) allows us to write 

- x\\2 < \\x - x'\\2 + \\x' - x\\2 < Wexh + ||e||2 < (1 + «l(l + '?)^^^)||ea;||2, 

where the second inequality follows from Theorem 5.3 and the third inequality holds with probability 
at least 1 — 4e~'^i(^)^. Combining this fact with the tail bound (59), we can establish the following 
guarantee. 

Theorem 5.5. Let k £ {1,2,.. .,N}, set D = kJ = and let be the N x D multiband 

^band 

m,od,ulated DPSS dictionary defined in (49). Fix 6 € (0, 0.2) and /3 € (0, 1), and let A be an M x N 
subgaussian matrix with M satisfying (53). If x{t) is a Gaussian random process obeying the K- 
band model described in Theorem ^.4., x G is generated by sampling x{t) as in Theorem 4-4> 
and we use block-based IHT (23) with ^ G [1 + 5, 1.5(1 - 5)) to recover an estimate of x from the 
observations y = Ax, then the resulting estimate x will satisfy 

K ^"^ 

\\x -x\\l<{l + «i(l + vf'^fil + 7)^ E >^N,w (61) 

l=k 

with probability at least 1 - /3 - 4e-^i('')*^ - 2e-™"^(T'/^2.7/c2)^ y;/jere W = S^i^dli specified in the 
dictionary construction (49), ki > 1 is a constant as specified in Theorem 3.1, ci is a constant as 
specified in (11), and C2 is a universal constant. 

Although the above result holds for any k G {1,2,..., iV}, it is perhaps most interesting when 
the number of columns per band k is chosen to be on the order of 2NW . For example, if we take 
k = 2NW{\ - e), (61) will read 

\\x - x\\l < (1 + Ki(l + nf'^f{l + 7)(e + Cie-^''^)KN. (62) 

Supposing that K, ni, rj, and 7 arc fixed, one can make the upper bound appearing in (62) as small 
as desired (relative to E [HxHl] = A^) by choosing e sufficiently small and N sufficiently large. 
Specifically, the term e + Cie"'^^^ can be made arbitrarily close to zero by choosing e sufficiently 
small and N sufficiently large. Of course, as one increases the length N of the signal vector, the 
requisite number M of measurements will increase proportionally (this is captured in (53) via the 
dependence on D). What is important about the measurement bound is that the permitted 
undersampling ratio relative to the Nyquist sampling rate (supposing Tg = 5^), will scale like 

N ~ Bnyq 

In other words, we need only collect compressive samples at a rate proportional to i^5band) the 
total amount of occupied bandwidth (i.e., the so-called Landau rate [41]). 



^'^One could also use Lemma 5.3 to guarantee that, with high probability, ||x||i will not be too small compared to 
N. We omit these details in the interest of conciseness. 
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For sufficiently small k, wc can establish a similar guarantee for block-based CoSaMP. Let a 
be tfie recovered coefficient vector, and define x := *S. Then we can write 



— 2II2 < ||£c — aj'IU + ||^^'Q;' — Nfcil 



2 



< lle^lls + VTTsiVCpV^I 



a' — SII2 



< \\ea^h + K2\/l + 3NCl^^e~^\\e\\2 

< (1 + K2{1 + 3iVC|/'e-^)V2(i + r/)i/2^||e,||2, 

where the second line follows from Lemma 5.2, the third line follows from Theorem 5.4, and the 
last line holds with probability at least 1 — 4e^'^^^^'^^ . Combining this fact with Corollary 4.1 and 
the tail bound (59), we can establish the following guarantee. 



Theorem 5.6. Let k = 2NW{1 - e), set D = kJ = N{1 - e)BnyqTs < N, and let * be the 

1/2 _£2^ 

N X D multiband modulated DPSS dictionary defined in (49). Fix 6 G (0,0.1 — QNC-^ e 2 ) 
and P G (0,1), and let A be an M x N subgaussian matrix with M satisfying (55). If x{t) 
is a Gaussian random process obeying the K-band model described in Theorem 4-4) ^ ^ is 
generated by sampling x{t) as in Theorem 4-4} '"■^e block-based CoSaMP (Algorithm 2) with 
S = BBCoSaMP(A*, /, y, ivT) to recover an estimate of a. from the observations y = Ax, and we 
set x := \f S, then the resulting estimate will satisfy 

\\x - x\\l < (1 + K2(l + 3NCl^^e-^)^/\l + r/)^/2)2(l + ^)(e + Cie-^'''^)KN (63) 

with probability at least 1 — /3 — 4e~'^i*^^')^ — 2e~™™*^'''^/^2''^/'^^'', where K2 > 1 is a constant as specified 
in Theorem 3.2, Ci and C2 are constants as specified in Lemma 4-1, ci is a constant as specified 
in (11), and C2 is a universal constant. 

Once again, supposing that K, K2, ri, and 7 are fixed, one can make the upper bound appear- 
ing in (63) as small as desired (relative to E [Hailll] = N) by choosing e sufficiently small and N 

sufficiently large. Specifically, the terms ZNC^ e 2 and e -|- C\e can be made arbitrarily 

close to zero by choosing e sufficiently small and iV sufficiently large. Thus, we have again guaran- 
teed that most finite-length sample vectors arising from multiband analog signals can — to a very 
high degree of approximation — be recovered from a number of compressive measurements that is 
proportional to the underlying information level. 



6 Simulations 

In this section we present the results of a suite of simulations that demonstrate the effectiveness of 
our proposed approaches to multiband signal recovery from compressive measurements. In doing so, 
we address various practical considerations that arise within our framework. All of our simulations 
were performed via a MATLAB software package that we have made available for download at 
http://www.mines.edu/~mwakin/software/. This software package contains all of the code and 
results necessary to reproduce the experiments and figures described below, but should additionally 
serve as a platform upon which other researchers can test and develop their own extensions of these 
ideas. 

6.1 Implementation and experimental setup 

We begin with a brief discussion regarding our implementation and experimental setup. 
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6.1.1 Computing P (a;, if) 



We first recall that a key step in solving either block-based IHT (specifically, the variation in (23)) or 
block-based CoSaMP (specifically, the variation x = BBCoSaMP(A, ^,y,K)) is the computation 
of V{x,K), i.e., the projection of x onto the set of X-block sparse signals in the dictionary ^ 
(as defined in (22)). If * were an orthonormal basis, this projection could be computed simply 
by taking and setting to zero all but the K blocks of this vector with the highest energy. 

Unfortunately, the columns of the multiband modulated DPSS dictionary * (as defined in (49)) 
are not orthogonal, and so this thresholding approach is not guaranteed to even approximate the 
correct solution. In fact, when the number of columns per band k exceeds 2NW{l — e), ^ will fail to 
act as an isometry (recall Lemma 5.2), and when k exceeds 2NW, ^ will actually be overcomplete. 
As we will see in our experiments, it can often be desirable to choose k to be larger than 2NW, 
but this lack of isometry and this overcompleteness prevent us from applying any of the standard 
arguments from sparse approximation to solving the projection problem. 

It is important to note that in (22) we are seeking a vector z that minimizes ||£C — z||2 and has 
a block-sparse representation in ^. Although z will be unique, the corresponding representation in 
^ need not be unique if ^ fails to satisfy the block-RIP. Unfortunately, there seems to be relatively 
little known about solving this type of sparse approximation problem. Nevertheless, since we are 
ultimately interested in z (not its representation with respect to *) this should not necessarily 
stop us from proceeding under the hope that standard sparse approximation algorithms can still 
succeed even when ^ is moderately overcomplete. Thus, in our simulations, we use block-OMP 
to obtain an approximation to V(x,K). Block-OMP is a straightforward generalization of the 
classical OMP algorithm. It proceeds by: (i) initializing the residual vector r = x and the set 
X = 0, (ii) computing the proxy vector h = ^^r, (in) identifying the block in h that has the 
largest energy and adding this block to the set I, and (iv) orthogonalizing r against the columns 
in ^'x- This last step is equivalent to the "update" step in CoSaMP in Algorithms 1 and 2. Steps 
(ii) through (iv) are repeated until termination. The final output can be computed from x — , 
where is the residual after K iterations. 

We close by noting that the lack of a provable technique for computing V{x, K) represents an 
important gap between some of our theory and practice. Specifically, one of our main results (The- 
orem 5.5) pertains to block-based IHT (23), but this is an algorithm that we can only implement 
approximately. As noted at the end of Section 3.2.3, we conjecture that one could also establish 
theoretical guarantees for the BBCoSaMP(A, 'J', i^) version of block-based CoSaMP, but this 
too relies on being able to compute V{x,K) exactly. In light of this, we point out the following. 
First, we are able to implement the BBCoSaMP(A^, J, y, K) version of block-based CoSaMP ex- 
actly because this problem can be solved with simple block thresholding of the vector a, and one 
of our main results (Theorem 5.6) does pertain specifically to BBCoSaMP(A'I', /, i^). Second, 
our simulations in Sections 6.2 and beyond, which rely on block-OMP for approximating V{x,K), 
indicate that we are clearly finding high quality solutions to this problem despite the lack of prov- 
able guarantees. What is interesting is that our experimental results seem most favorable when k 
exceeds 2NW by a nontrivial amount, and that is a regime where the dictionary ^ is overcomplete. 
We hope to further address the implications of overcomplete ^ in future work. 

6.1.2 Regularized least-squares 

In our simulations below, we focus exclusively on testing the two versions of block-based CoSaMP. 
One could conceivably expect similar experimental results using a properly-tuned version of block- 
based IHT, but since we lack theory concerning V{x,K) we find it more worthwhile to devote 
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our space to exploring the various practical issues surrounding block-based CoSaMP. Specifically, 
let us note that when solving block-based CoSaMP, the key steps in this algorithm consist of 
solving the least-squares problem in the "update" step and of computing V{x,K) (twice), which 
also involves solving (potentially multiple) least-squares problems. Thus, it is crucial that we solve 
these least-squares problems efficiently and accurately. 

While there has been much work in the CS community on solving these problems efficiently 
(for example, see [50] ) , we leave aside the issue of speed for the moment and focus instead on the 
issue of stability. In our context, we must take some care in solving these problems, since when the 
number of columns k per band becomes becomes much larger than 2NW ^ the matrices involved 
can become potentially rank deficient and thus standard approaches can be numerically unstable. 
Fortunately, this instability can be easily addressed using relatively simple techniques. 

We first consider the least-squares problem that must be solved in the computation oiV{x,K). 
In this case we will not focus specifically on the block-OMP algorithm that we implement, but 
simply assume that we have some technique for obtaining an estimate X of S{x, K). At that point, 
solving (22) is relatively straightforward. One could solve this via 'P{x, K) = ^x^xX, but when I 
contains indices corresponding to adjacent bands, *x can be nearly rank deficient. In this case, a 
better approach is to construct a reduced basis U for TZ{^x)- One can then compute the projection 
simply via V{x, K) = UU^x. In our context, we use this reduced basis approach to perform the 
orthogonalization in each step of the block-OMP approach to computing V{x,K). 

We now turn to the main least-squares problem in the "update" step of CoSaMP. In this step 
we are given a set of blocks I and wish to solve a problem of the form 

X = arg min \\y — Az\\2 s.t. z G TZ{^x)- (64) 

z 

Note that in this case we require more than simply the projection of y onto TZ{A^x) — we wish to 

actually calculate the vector z corresponding to this projection. This requires a different approach 
than the simple reduced basis approach described above (although constructing a reduced basis 
for 7l{'^x) can still be useful in this context). In our simulations we regularize (64) via Tikhonov 
regularization [55, 70, 71]. The key idea is to replace (64) with 

5 = arg min Ijy — A2;||2 s.t. 2 G 7^(^'x), II2II2 < 7- (65) 

z 

The constraint ||z||2 < 7 significantly improves the conditioning of this problem when A^x is 
ill-conditioned. In our simulations below, we use the toolbox of [38] to efficiently solve (65). Note 
also that in our setting, the parameter 7 can be easily set using the norm of the original signal x 
that we are acquiring. In all simulations below (using either variation of block-based CoSaMP), 
we assume that an upper bound on ||£c||2 is known. In practical settings such an upper bound will 
usually be available, but if necessary one could also estimate this upper bound from ||y||2- 



6.1.3 Experimental setup 

In all of the experiments below, we assume that Tg = n — and that there are J = „ "^'^ = 256 

-^nyq -Dband 

possible bands. For each value of k that we consider, we set D = kJ and let * be the N x D 
multiband modulated DPSS dictionary defined in (49). The digital half-bandwidth parameter W 
is set to he W = -^b^nd^s _ g^j^^j consider sample vectors with length N = 4096, so that 
2NW = 16. 

In our experiments we generate our sampled multiband signal vectors x by selecting the positions 
of the K occupied bands uniformly at random from the J possibilities, and then within each 
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band adding together 50 complex exponentials with frequencies selected uniformly at random from 
within the frequency band (not aligned with the "Nyquist grid"). Each complex exponential in 
this summation is given a random amplitude and phase via multiplication by a complex Gaussian 
random variable. There are a variety of other possibilities for generating test signals which we 
considered and for which we have observed essentially the same results as presented below. 

We will typically report our results in terms of the SNR of the recovery, which in this context 
is defined as 



where x is the original input to the measurement matrix and x is the recovered estimate of x 
provided by CoSaMP. In all cases where we plot the SNR, what we actually show is the result of 
50 independent trials (with a new signal for each trial). Rather than plotting the mean SNR, we 
plot the contour for which 95% of trials result in an SNR at least as large as the level indicated (so 
that only 5% of trials result in a worse performance than indicated) . 



We begin with a comparison between the two versions of block-based CoSaMP discussed in Sec- 
tion 3.2— specifically, BBCoSaMP(A, y, K) and BBCoSaMP(A*, I, y, K). Recah that the key 
difference between these approaches is that the former essentially tries to recover x directly, while 
the latter instead attempts to first recover cx and then estimates x as ^a. Since less is known about 
the theoretical properties of BBCoSaMP(A, ^ , y, K) (both in and of itself and with respect to the 
computation of V{x,K)), we have focused more on BBCoSaMP(A*, /, i^T) in our theoretical 
development. However, all of our theorems regarding BBCoSaMP(A*, J, y, K) (e.g.. Theorems 5.4 
and 5.6) are limited to the case where the number k of DPSS vectors per band used to construct 
the multiband modulated DPSS dictionary ^ satisfies k < 2NW, and as we will see shortly, in 
practice we obtain significant improvements in performance by considering k > 2NW. 

In our experiments we evaluate both versions of block-based CoSaMP as a function of k. For 
the purposes of this experiment, we assume that there are K = 5 active bands, we set the number 
of measurements M = 512, and we construct the measurement matrix A to be M x with 
i.i.d. Gaussian entries with mean zero and variance j^. In this experiment, we take y = Ax 
and do not add noise to the measurements. In Figure 5(a) we see that when k is small the 
two approaches perform similarly, but when k exceeds 2NW by more than a small amount, the 
performance of BBCoSaMP(A, y, K) is far superior to that of BBCoSaMP(A*, /, y, i^). In 
fact, when k becomes large we observe that for at least 5% of trials BBCoSaMP(A*, /, y, i^) 
results in an SNR of almost dB (which can be achieved simply via the trivial estimate x = 0.) 
This gap is further illustrated in Figure 5(b), which shows the probability that the performance of 
BBCoSaMP(A*, /, y, K) is within 3 dB of BBCoSaMP(A, y, K) as a function of k. This also 
begins to rapidly decay when k exceeds 2NW. For a point of reference, in Figure 5(c) we plot the 

(k) 

eigenvalue corresponding to the first DPSS basis vector which is not used in constructing 

our dictionary This value is the dominant term in the approximation error that we can expect 
when using * to represent multiband signals. We see that BBCoSaMP(A, y, if ) continues to 

(k) 

improve with increasing k, roughly until A]y approaches the level of machine precision. 

Our experimental results suggest that in our analysis of BBCoSaMP(A'J', /, y , K) we are correct 
in requiring that k be relatively small, as the algorithm does indeed break down when k becomes too 
large. However, before this breakdown the performance can be quite favorable, with recovery SNR 
exceeding 88 dB, and Theorem 5.6 does guarantee even better performance were N to increase. 




6.2 BBCoSaMP(A, *, y, K) versus BBCoSaMP(A*, /, y, K) 
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Figure 5: Comparison of BBCoSaMP(A, *, y, /-iT) to BBCoSaMP(A*, /, y, iC). (a) The recovery SNR 
of BBCoSaMP(A, y, K) and BBCoSaMP(A*, J, y, K) as a function of the number k of DPSS vectors 
used per band in constructing the muhiband modulated DPSS dictionary VE'. (b) The probabiUty that 
the performance of BBCoSaMP(A*, /, y, K) is within 3 dB of BBCoSaMP(A, y, K) as a function of k. 
(c) The eigenvalue corresponding to the first DPSS basis vector that is not used. 

We speculate that the breakdown in the performance of BBCoSaMP(A^, /, j/, i^) as k grows 
is likely due to the fact that, for large enough k, the dictionary ^ begins to contain highly coherent 
columns, so that any method that attempts to recover a itself is likely to encounter significant 
problems. However, the strong performance of BBCoSaMP(A, y, /C) (with no clear limitation 
on the size of k) seems to suggest that this latter approach likely satisfies the kinds of guarantees 
provided for block-based IHT in Theorems 5.3 and 5.5, which provide for arbitrarily large k and 
hence arbitrarily accurate recovery. In light of these results, for the remainder of our experiments 
we focus exclusively on BBCoSaMP(A, y, K). 

6.3 Impact of measurement noise 

In most realistic scenarios, the compressive measurements y will be subject to various sources of 
noise (including noise in the signal itself, noise within the sensing hardware, quantization, etc.) As 
noted in Section 5.2.3, our approach to signal recovery inherits the same robustness to noise that is 
exhibited by traditional CoSaMP. To illustrate this, we consider the case where the measurements 
y are corrupted by additive white Gaussian noise, i.e., y = Ax + e where each e ~ A/'(0,cj^/). 
In our experiments we consider three different noise levels, quantified by the "measurement SNR" 
(MSNR), which is defined as 

MSNR = 201ogio (^^^J 

In general, the theoretical guarantees for this scenario suggest that the SNR of the recovery should 
be roughly comparable to the MSNR. 

The results of our experiment are illustrated in Figure 6. In this case we are still assuming that 
there are K = 5 active bands, we use an i.i.d. Gaussian matrix A with M = 512 rows, and we 
plot the performance as a function of k. We observe that the results are essentially the same as 
in Figure 5(a), but that as we increase k the recovery SNR will hit a plateau dictated by the best 

^^Note that the case where the signal is corrupted with white noise as opposed to the measurements can be reduced 
to the case where the signal is noise-free and the measurements are noisy, and thus we restrict our attention to noisy 
measurements. See [11,22] for further discussion of this equivalence and the challenges posed by signal noise. 
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-i-MSNR = lOOclB 
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Figure 6: Impact of measurement noise on the signal recovery SNR as a function of k. 



possible SNR achievable for a given MSNR. Roughly speaking, when k is small, performance is being 
limited by "modeling error", but as k increases we eventually reach a regime where measurement 
noise surpasses modeling error as the limiting factor, and no further gains are possible by increasing 
k. Thus, in practice it will typically be the noise level that dictates the optimal choice in k. For 
the remainder of our experiments we wish to avoid any assumptions about the noise level, and so 
we restrict our attention to noise-free measurements. However, the results all translate to the noisy 
setting roughly as one would expect based on these results. 



6.4 Required measurement rate 

We now study the performance of our approach as a function of the number of measurements M. 
Specifically, we consider the cases where K = 5, 10, and 15 bands are active, and for each value of 
K we let M vary from 2NWK (the Landau rate) up to lANWK (oversampling the Landau rate 
by a factor of 7). The results are shown in Figures 7(a) and (b), which plot the results in terms of 
2N^K respectively. We observe that when the measurement rate is only 3 or 4 times that 

of the Landau rate, we are already doing extremely well, and we obtain near-perfect recovery at 6 
times the Landau rate. We observe very similar behavior for all values of K. Thus, for the sake 
of simplicity, we restrict the remainder of our experiments to the case where there are only K = 5 
active bands. 

Before moving on, we note that an important caveat to these results is that as we vary M and 
it is sometimes necessary to adjust the value of k used in the construction of This can easily 
be seen by considering the regime where the measurement rate is very close to the Landau rate. 
Suppose M = 2NWK and that we knew a priori which bands were occupied. In this case we could 
perform recovery using the appropriate submatrix of which would have kK columns. Thus, 
if A; > 2NW, then we would have only M = 2NWK measurements and would need to estimate 
kK > M values — a situation we would clearly like to avoid. Moreover, if we must also estimate 
the support from the data as well, then it is clearly best to avoid setting k to be too large when M 
is small. Thus, in the experiments for Figure 7 (and for those below) we set k based on a rule of 
thumb that we determined based on empirical performance. Specifically, we set k = 2NW when 



2NWK — ^- ^'^^ 2NWK ^ t^'^l' linearly increase k from 2NW = 16 to, in our case /c = 38 
(which represents the rough point at which the first omitted eigenvalue A]y reaches the level of 
machine precision). For larger values of M there is no performance gain by considering k larger 
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Figure 7: Recovery performance as a function of the number of measurements M for K = 5, 10, and 15 
active bands, (a) Performance in terms of 2nwk ' ^^Jci measures oversampling with respect to the Landau 



rate, (b) Performance in terms of the actual number of measurements M . 



than this level. 

6.5 Alternative measurement architectures 

As promised in Section 3.1, we now evaluate the performance of our approach using more practical 
measurement schemes. We compare the performance achieved using an i.i.d. Gaussian matrix to 
that achieved using an M x random demodulator matrix [73] and to a random sampling approach 
where M samples are taken uniformly at random on the length-A Nyquist grid. The results are 
shown in Figure 8. We observe very similar performance among the three approaches, with the 
Gaussian matrix performing best, then the random demodulator, followed by random sampling. 
Note that while there is certainly a gap between these three approaches, it is also most pronounced 
in a regime which is likely irrelevant in practice, since it is rare to find applications where a recovery 
SNR of 200 dB or more is feasible. 

6.6 Comparison with DFT-based approach 

Finally, we close with a comparison between the performance achievable using our proposed multi- 
band modulated DPSS dictionary ^ and the performance achievable using the A x A DFT as 
a sparsifying basis. We test both dictionaries using an M x A random demodulator measure- 
ment matrix [73], which was originally designed with frequency-sparse signals in mind (but under 
a multitone signal model, not a multiband model). Due to DFT leakage effects, we believe that 
it would be inappropriate to break the DFT dictionary into bands and use a block-based recovery 
algorithm; thus when ^ is the DFT basis, we use x = CoSaMP(A, ^ S) as our recovered signal 
estimate. In order to give the DFT approach the best possible chance for success, for each value 
of M we consider a range of possible values for the sparsity parameter S, selecting the value of S 
that achieves the best possible performance according to an oracle. 

The performance gap between these two approaches — illustrated in Figure 9(a) — is monumen- 
tal. Using the DFT dictionary, we never achieve a recovery SNR better than 20 dB over this 
range of measurements, whereas using the multiband modulated DPSS dictionary with block-based 
CoSaMP, the recovery SNR can exceed 200 dB. 
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Figure 8: Comparison of recovery SNR using i.i.d. Gaussian measurements, random demodulator measure- 
ments, and random samples. 

To illustrate the typical difference between these two approaches, in Figure 9(b) we show in 
blue (solid line) the DTFT of a representative signal from one trial in this experiment. Plotted 
in red (dashed line) is the DTFT of the signal vector recovered using the DPSS dictionary with 
2j^j^ = 4; the recovery SNR is 109 dB and the recovered signal is visually indistinguishable from 
the original. Plotted in green (dots) are the DFT coefficients of the signal vector recovered using 
the DFT dictionary with the same measurements; the recovery SNR is now only 13.4 dB. While 
the DFT-based estimate does successfully capture the main peaks of each band, it naturally misses 
all of the sidelobes of each band (and despite the multiband nature of x{t), these sidelobes are 
important for accurately representing the window x). Moreover, due to modeling error, the DFT- 
based approach also results in a number of spurious artifacts in regions where there is no significant 
frequency content in the original signal. 

We note that for the DFT-based approach in this experiment, the best estimate produced 
using CoSaMP(A, y, 5) was observed when setting S = 85. On the other hand, the DPSS 
approach (according to our rule of thumb) selects k = 27, so that in this approach there are a 
total of kK = 135 free parameters. It is not surprising that the DPSS approach results in superior 
performance since the model has so many extra dimensions. What is potentially surprising is that 
if we set S = 135, the DFT approach exhibits a complete breakdown and is unable to make any 
practical use of these extra degrees of freedom. 

7 Conclusions 

There are likely to be many ways that one could bridge the gap between the discrete, finite frame- 
work of CS and the problem of acquiring a continuous-time signal. In this paper, using a dictionary 
constructed from a sequence of modulated DPSS basis elements, we have argued that when dealing 
with finite-length windows of Nyquist-rate samples, it is possible to map the multiband analog 
signal model — in a very natural way — into a finite-dimensional sparse model. This allows us to 
apply many of the standard theoretical and algorithmic tools from CS to the problem of recovering 
such a sample vector from compressive measurements. Moreover, the sparse signals that we en- 
counter in this model actually have a structured sparsity pattern (namely, block-sparsity) , which 
we can exploit through the use of specialized "model-based" CS recovery algorithms. Although 
our recovery bounds are qualified — we have showed that most finite-length sample vectors arising 
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Figure 9: Comparison of recovery performance between block-based CoSaMP with the multiband modulated 
DPSS dictionary and standard CoSaMP using the DFT basis, (a) Performance comparison as a function of 
(b) The DTFT of a representative signal, and the recovered estimate using each approach. 
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from multiband analog signals can be highly accurately recovered from a number of measurements 
that is proportional to the Landau rate — these qualifiers are ultimately necessary. Moreover, we 
have demonstrated through a series of careful experiments that the multiband modulated DPSS 
dictionary can provide a far superior alternative to the DFT for sparse recovery. Our experiments 
also confirm that certain practical measurement architectures such as the random demodulator, 
which was previously viewed only as a mechanism for capturing multitone signals [44], can indeed 
be used to efficiently capture multiband signals. 

As the reader will note from our discussions regarding the selection of the number of columns 
per band A;, the computation of V{x,K), etc., there are certain subtle challenges in choosing the 
proper dictionary design and implementing an effective recovery algorithm. However, our experi- 
mental results do confirm that it is possible to navigate these waters. We have also aimed to give 
some practical insight into the proper choice of design parameters, and we have made all of the soft- 
ware for our simulations available for download at http://www.mines.edu/~mwakiii/software/. 
Ultimately, in addition to the open algorithmic questions discussed in Section 6, there are many 
open questions concerning the most effective way to construct a multiband DPSS dictionary; one 
could imagine possible advantages to considering multiscale dictionaries (see also [59]), adaptive 
band sizes, overlapping bands, etc. We leave the consideration of these questions to future work. 

It is worth emphasizing that the remarkable efficiency of our DPSS dictionary for sparsely ap- 
proximating finite-length multiband sample vectors is owed entirely to the eigenvalue concentration 
behavior described in Section 4.2.4. It would be interesting to explore other possible problem do- 
mains (beyond multiband signal models) where similar eigenvalue concentration behavior holds. 
Again, we leave the consideration of such questions to future work. 

We conclude by noting that applications of sparse representations abound, and so there are 
many possible settings outside of CS where the multiband modulated DPSS dictionary could be 
useful for processing finite-length sample vectors arising from multiband signals. For example, 
there are several possible applications in compressive signal processing [21], where one can attempt 
to answer certain questions about a signal vector directly from compressive measurements of that 
vector without having to actually perform signal recovery. One specific problem in this area could 
involve cancellation of an interfering signal that has corrupted the measurements. Suppose s{t) is 
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an analog signal of interest (not necessarily obeying a multiband model), and i(t) is a multiband (or 
even just single-band) interferer supported on bands indexed by the set X. Let s and i denote the 
sample vectors arising from these two signals, and suppose we collect compressive measurements 
y = A{s+i). Then because i is concentrated in Tl{^x), it follows that Ai is concentrated in Tl{Ax), 

where A = A^. One can therefore define an orthogonal projection matrix P := I — AxA^x- -^Y 
applying P to the measurement vector y, one can remove virtually all of the influence of the 
interferer, since PAi ~ even for very strong interferers, and therefore Py = PAs + PAi ~ 
PAs. From these processed measurements Py one can attempt to recover s or answer various 
compressive-domain questions that do not require signal recovery. It can even be possible to derive 
RIP bounds for PA and thus provide guarantees on the performance of these techniques. We 
refer the reader to [20, 23] for additional discussion of the interference cancellation problem and an 
example using a preliminary version of the modulated DPSS dictionary. 
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A Review of the Karhunen-Loeve (KL) Transform 

We briefly review the basic ideas behind the KL transform [68]. Suppose that r G C''^ is a random 
vector with mean zero and a known autocorrelation matrix denoted by R with entries defined as 



i2[m, n] = E r[n]r[m] 

where the overline denotes complex conjugation. Next suppose that we would like to find, for some 
fixed k G {1,2,..., a fe-dimensional subspace Q of that best captures the energy of r in 
expectation. That is, we wish to find the Q that minimizes 

iE[lk-^g^ll2] 

over all A;-dimensional subspaces of C^. The optimal solution to this minimization problem can be 
found by computing an eigendecomposition of R. In particular, let 

R = UArU" 

denote the eigendecomposition of R, with the eigenvalues Ao(-R) > Ai(-R) > • • • > Xn-i{R) > 
along the main diagonal of Ar. Let Uj- denote the N x k matrix that results from taking the first 
k columns of U. The columns of Uk provide an orthonormal basis for the optimal Q, and thus we 
will have Pq = UkU^ . For this optimal choice of Q, we will also have 

7V-1 



E[||r-PQr||i] = ^A,(fl). 
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B Proof of Theorem 4.1 



Proof. Let /' denote a random variable with uniform distribution on [/c — W,fc + W\, let 9' denote 
a random variable with uniform distribution on [0, 27r), and let r = r{f',0') := e^^ Sf denote a 
random vector in defined in terms of /' and 9'. Then we can write 

— ■ / \\ef - PQCfWl df = / \\ej, - PQef.\\lpf,U')df 

Jf^-w Jfc-W 

= / / \\e^' {ef, - PQef>)gpf,{f)-df' d9' 

Jo Jfc-W ^TT 

= / / lle^'^' Bf, - PQ{e^''ef>)\\lpfif')p0,{9')df' d9' 

Jo Jfc-W 

= E[\\r-PQr\\l]. 

We next verify that the random vector r has mean zero, i.e., for each n G {0, 1, . . . , — 1} we have 

1-277 ffc + w 

E [r[n\] = / / e^'^ e f, [n]pf. {f')pg, (9') df d9' 
J fc-w ^Wtt 

fc + W 



Jfc-W 
271 ffc + w 



1 rfc+W f27r 

I- ■ / e^2-/'" / e^-^ d9' df 
Vtt Jf^_w Jo 



AWtt 
= 0. 

Next, observe that the random vector r has autocorrelation matrix R with entries: 



R[ni,n]=¥, r[n]r[m 
= E 



efi[n]e e^^ 



1 

2W 
1 

2W 



fc + W 



fc-w 



,-j2nf'n^2iTf'm^jl 



ei2T/cm . 2T^sinc {2W{m - n)) ■ g-^'^'^^^". 



Therefore, we can write 



R = • Ef B AT , 

2W ^ 



(66) 



where Ef^ is as defined in (44) and -BAr,w is as defined in (36). Plugging the eigendecomposition 
for B^y^/- from (38) into (66), we obtain 

R = ^ ■ Ef^SN,W^N,wSN,W^fc- 

From this eigendecomposition and standard results concerning the KL transform (see Appendix A) , 
it follows that the optimal A;-dimensional subspace Q will be spanned by the first k columns of 
Ef^SN,w- Furthermore, this choice of Q yields 

7V-1 , N-1 



2W 

as desired 



1 rf-+^ 1 

- • / \\ef - PQSfWl df = E[\\r- PqvWI] = ^ A,(H) = ^ E ^n,w, 



□ 
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C Proof of Theorem 4.2 



Proof. The sequence x[n] := x{nTs),n G Z is a discrete-time, zero-mean, wide sense stationary 
random process with power spectrum equal to 



2W' fc-w<f<fc + w, 

0, otherwise. 



and the vector x G equals the restriction of x[n] to the indices n = 0, 1,...,A^ — 1. The inverse 
DTFT of the power spectrum Pxif) gives the autocorrelation function for x[n]: 

r^[n] = e^'^^''^- ■ sine {2Wn) . 

It follows that the random vector x has mean zero and an autocorrelation matrix R given by (66) . 
Thus, X has exactly the same autocorrelation structure as the random vector r we considered in 
Appendix B, and so the same KL transform analysis applies. Finally, we can also compute 

N-l 

E [11*112] = Y.^ M^f] = ^^^[0] = ^• 

n=0 

□ 



D Proof of Theorem 4.3 

Proof. First, let us define to be an operator that modulates a discrete-time signal by fc\ more 
specifically, 

for all n G Z. Since the modulated DPSS vectors form an orthonormal basis for C^, and since the 
discrete-time signal x[n\ is time-limited, we can write x[n] as 

Af-l 

x=Y^ «M^/c7jv(sSv!w)' 

where the coefficients a are given by a[£] = {x, SfcTNis^^w)) ~ ^fc^^Nw) for £ = 0, 1, ... , N—1 
and satisfy || II 2- By linearity, we can also write 

AT-l 

where the functions S f^^wS f^TNis^^^^) are modulated DPSS's and therefore remain orthogonal. 
Therefore, 

N-l N-l 

{1-S)\\x\\l < \\Bf^,wx\\l = \m'\\B^,w£fM4]w)f2 = E 
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Now note that if A; = 2NW{1 + e), then 



fc-i 



{i-6)\\x\\i<Y,\ocm'x%l^+Y.H^]\'>^. 



12 



£=0 
k-1 



£=k 



N-1 



w 

£=0 e=k 
< \\PQx\\l + \\x\\lNC3e-^''' 

= {\\x\\l-\\x-PQx\\l) + \\xgNC3e- 
for N > Ni. Rearranging terms in the above inequality yields (47). 



■C4N 



□ 



E Proof of Theorem 4.4 

Proof. For each i £ I, the sequence Xi[n] := Xi{nTs),n G Z is a discrete-time, zero-mean, wide 
sense stationary random process with power spectrum equal to 



= J 2WK 

I 0, 



+ i2W, + + l)2W 



otherwise. 



Xi G to be the restriction of Xi[n] to the indices n = 0, 1, . . . , AT — 1, we have x = J2iei 



These discrete-time sequences will also be independent and jointly wide sense stationary. Defining 
Xi G to be the restriction of Xi [n] to 
Pxif) = Y^i^xPxiif)- We can compute 

Af-l N-1 . ^ 

E [||a;||i] = ^ E [\x[n]\'] = ^ E ^ [^^N^] = E E^ [I^^Wl'] =NK.- = N. 



N-l 



n=0 



n=0 i,ieX 



n=0 ieX 



Also, we have 



E ||£C- P*^cc 



E 



= E 



< E 



< E 



< E 



E *M ~ ( E 



I E ll'*'* ~ -P*iaJi||2 1 
Kiel J 

^E 11*^ ~ P^iXi\\2j 



-^E II** ~ p^iX 



2 

i||2 



iex 



= K^E[||a;i-P*.a;i||i] 



iex 



TV-l 
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where the third hne follows from the triangle inequality, the fourth line follows because the columns 
of each belong to as well, the fifth line follows because < -fC||/i||2 for any iiT-dimensional 
vector h, and the final line follows from Theorem 4.2. □ 

F Proof of Lemma 5.1 

Proof. Let 'I'j^ and ^i,^ denote the blocks from which Qi and respectively, are drawn. If ii = 12, 
we have already established at the beginning of Section 4.3.1 that (91,92) = 0- Thus, we assume 
henceforth that ii ^ 12- 

Let £f. denote an operator that modulates a discrete-time signal by fi; more specifically, 
£f^{s)[n\ = e^'^'^f^'^s[n\ for all n G Z. Let Bf^^w denote an orthogonal projection operator that 
takes a discrete-time signal, bandlimits its DTFT to the frequency range f £ [fi — W, fi + W], and 
returns the corresponding signal in the time domain. Finally, define B'j^^ := I—Bf._^^w—Bfi^^w, 

which is also an orthogonal projection for finite-energy signals because — fi^] > For some 
e, ^' G {0, 1, . . . , 2NW{1 - e) - 1}, we can write 

£fn ^Ns'i]w ' £fi2 '^Ns'ilv 
^fn,w£fi2'^NS^^W + ^fi2<W^fi2'^l^^N!w + ^%,Ji2,W^fi2'^N^N^w) 

^fn,w£fi^TNS^^]w,Bf^_^^w£fi^TNs'ijy'^ + {Bf^^^w£fi^TNS^^]w, 
From (32), (34), and the fact that \fi^ - /j J > 2W , it follows that 

I (91,92) I < \f>^ Vi - Aij;i^ + ^1 - \^s,w ■ - ^N,w - ^n!w 

Finally, (51) follows by bounding the s/X terms in this expression by 1 and by bounding the Vl — A 
terms in this expression using (39). □ 
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